CS224d-Lecture8

時間 2021-01-13 標籤機器學習 nlp

條件概率，其中 window size = n

P (w 1, w 2, . . ., w T) = \prod i = 1 m P (w i | w 1, w i - 1) \approx \prod i = 1 m P (w i | w 1, w i - 1)

ht=σ(Whhht−1+Whxxt)
y^t=softmax(Wsht)

total error

\partial E \partial W = \sum t = 1 T \partial E t \partial W

\partial E t \partial W = \sum k = 1 T \partial E t \partial y t \cdot \partial y t \partial h t \cdot \partial h t \partial h k \cdot \partial h k \partial W

其中

\partial h t \partial h k = \prod j = k + 1 t \partial h j \partial h j - 1

故

由於取

h t = W f (h t - 1) + W (h x) x [t]

則

\partial h t \partial h k = \prod j = k + 1 t \partial h j \partial h j - 1 = \prod j = k + 1 t W T d i a g (f' (h j - 1))

| | \partial h j \partial h j - 1 | | < = | | W T | | \cdot | | d i a g (f' (h j - 1) | | < = β W β h

| | \partial h t \partial h k | | = | | \prod j = k + 1 t \partial h j \partial h j - 1 | | < = (β W β h) t - k

可能非常快的就變得很大或者很小。

precision = tp/(tp+fp) recall = tp/(tp+fn) F1 = 2(precision recall)/(precsion + recall)