Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs

時間 2020-12-25 標籤 RNN

Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs

What is RNN？

RNN的核心思想是利用時序信息。在傳統的神經網絡中，我們通常假設所有的輸入(輸出)相互之間都是獨立的。但是在很多實際的應用中這是一個非常不好的假設。比如我們要預測一個句子中的下一個單詞，我們最好能知道上一個單詞是什麼。
RNN中的R代表Recurrent，意味着它對每一個單元進行順序的重複操作，每一次輸出都和前面的運算結果相關。另一個理解RNN的方法就是構造「記憶」的概念，RNN擁有的「記憶」可以獲取之前計算的信息。理論上，RNN可以利用任意長結果的時序信息，但是在實際應用中，RNN受限於只能獲取之前幾個塊的信息。

$x_{t}$ is the input at time step t. For example, $x_{1}$ could be a one-hot vector corresponding to the second word of a sentence.
$s_{t}$ is the hidden state at time step t. It’s the 「memory」 of the network. $s_{t}$ is calculated based on the previous hidden state and the input at the current step: $s_{t} = f (U x_{t} + W s_{t - 1})$ . The function f usually is a nonlinearity such as tanh or ReLU. $s_{- 1}$ , which is required to calculate the first hidden state, is typically initialized to all zeroes.
$o_{t}$ is the output at step t. For example, if we wanted to predict the next word in a sentence it would be a vector of probabilities across our vocabulary. $o_{t} = s o f t m a x (V s_{t})$ .
我們可以將隱藏狀態 $s_{t}$ 看作是網絡的記憶。 $s_{t}$ 捕捉在前幾次網絡運算中所包含的信息。每個時刻的輸出 $o_{t}$ 只和該時刻的記憶有關。
和傳統的神經網絡不同， RNN每一層都共享同樣的參數(如前文中的U,V,W)。這表明我們是在重複地執行同樣的步驟，只是每個時刻的輸入有所不同。這極大地減少了我們運算所需要存儲的權值。
上述過程的每個時刻都有一個輸出，但根據不同的應用場景，這個輸出不是必要的。

Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs

Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs

What is RNN？

What can RNNs do?