Denoising Autoencoder

時間 2019-11-16 標籤 denoising autoencoder

做者：chen_h
微信號 & QQ：862251340
微信公衆號：coderpai
簡書地址：https://www.jianshu.com/p/f7b...python

降噪自編碼器（DAE）是另外一種自編碼器的變種。強烈推薦 Pascal Vincent 的論文，該論文很詳細的描述了該模型。降噪自編碼器認爲，設計一個可以恢復原始信號的自編碼器未必是最好的，而可以對「被污染/破壞」的原始數據進行編碼、解碼，而後還能恢復真正的原始數據，這樣的特徵纔是好的。算法

從數學上來說，假設原始數據 x 被咱們「故意破壞」了，好比加入高斯噪聲，或者把某些維度數據抹掉，變成 x'，而後在對 x' 進行編碼、解碼，獲得回覆信號 xx = g(f(x')) 。該恢復信號儘量的逼近未被污染的原數據 x 。此時，監督訓練的偏差函數就從原來的 L(x, g(f(x))) 變成了 L(x, g(f(x')))。微信

從直觀上理解，降噪自編碼器但願學到的特徵儘量魯棒，可以在必定程度上對抗原始數據的污染、缺失等狀況。Vincent 論文裏也對 DAE 提出了基於流行學習的解釋，而且在圖像數據上進行測試，發現 DAE 可以學出相似 Gabor 邊緣提取的特徵變換。app

DAE 的系統結構以下圖所示：dom

如今使用比較多的噪聲主要是 mask noise，即原始數據中部分數據缺失，這是有着強烈的實際意義的，好比圖像部分像素被遮擋、文本因記錄緣由漏掉一些單詞等等。函數

實現代碼以下：學習

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import tensorflow as tf 
import numpy as np 
import input_data

N_INPUT = 784
N_HIDDEN = 100
N_OUTPUT = N_INPUT
corruption_level = 0.3
epoches = 1000

def main(_):

    w_init = np.sqrt(6. / (N_INPUT + N_HIDDEN))
    weights = {
        "hidden": tf.Variable(tf.random_uniform([N_INPUT, N_HIDDEN], minval = -w_init, maxval = w_init)),
        "out": tf.Variable(tf.random_uniform([N_HIDDEN, N_OUTPUT], minval = -w_init, maxval = w_init))
    }

    bias = {
        "hidden": tf.Variable(tf.random_uniform([N_HIDDEN], minval = -w_init, maxval = w_init)),
        "out": tf.Variable(tf.random_uniform([N_OUTPUT], minval = -w_init, maxval = w_init))
    }

    with tf.name_scope("input"):
        # input data
        x = tf.placeholder("float", [None, N_INPUT])
        mask = tf.placeholder("float", [None, N_INPUT])

    with tf.name_scope("input_layer"):
        # from input data to input layer
        input_layer = tf.mul(x, mask)

    with tf.name_scope("hidden_layer"):
        # from input layer to hidden layer
        hidden_layer = tf.sigmoid(tf.add(tf.matmul(input_layer, weights["hidden"]), bias["hidden"]))

    with tf.name_scope("output_layer"):
        # from hidden layer to output layer
        output_layer = tf.sigmoid(tf.add(tf.matmul(hidden_layer, weights["out"]), bias["out"]))

    with tf.name_scope("cost"):
        # cost function
        cost = tf.reduce_sum(tf.pow(tf.sub(output_layer, x), 2))

    optimizer = tf.train.AdamOptimizer().minimize(cost)

    # load MNIST data
    mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
    trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels

    with tf.Session() as sess:

        init = tf.initialize_all_variables()
        sess.run(init)

        for i in range(epoches):
            for start, end in zip(range(0, len(trX), 100), range(100, len(trX), 100)):
                input_ = trX[start:end]
                mask_np = np.random.binomial(1, 1 - corruption_level, input_.shape)
                sess.run(optimizer, feed_dict={x: input_, mask: mask_np})

            mask_np = np.random.binomial(1, 1 - corruption_level, teX.shape)
            print i, sess.run(cost, feed_dict={x: teX, mask: mask_np})

if __name__ == "__main__":
    tf.app.run()

Reference:測試

[《Extracting and Composing Robust Features with Denoising
Autoencoders》](http://machinelearning.org/ar...編碼

《Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion》人工智能

做者：chen_h
微信號 & QQ：862251340
簡書地址：https://www.jianshu.com/p/f7b...

CoderPai 是一個專一於算法實戰的平臺，從基礎的算法到人工智能算法都有設計。若是你對算法實戰感興趣，請快快關注咱們吧。加入AI實戰微信羣，AI實戰QQ羣，ACM算法微信羣，ACM算法QQ羣。長按或者掃描以下二維碼，關注「CoderPai」微信號（coderpai）