Choose your language :
1. NNLM
这里看的是Bengio的论文
Bengio et al. \cite{nnlm} first proposed a Neural Network Language Model (NNLM) that simultaneously learns a word embedding and a language model.The language model utilizes several previous words to predict the distribution of the next word.For each sample in the corpus ,we maximize the log-likelihood of the probability of the last word given the previous words.This model uses a concatenation of the previous words’ embeddings as the input.The model structure is a feed-forward neural network with one hidden layer.
2. LBL
The Log-Bilinear Language Model(LBL) proposed by Mnih and Hinton combines Bengio’s Hierachical NNLM and Log Bi-Linear.It uses a log-bilinear energy function that is almost equal to that of the NNLM and removes the non-linear activation function tanh.
A previous study \cite{lbl} proposed a widely used model architecture for estimating neural network language model.
NER
Multi-Layer Neural Network
A. 3-layer network: Input Layer,Hidden Lyer,Output layer. Except input units,each unit has a bias.
preassumption calculation
\[\begin{equation} net_{j} = \sum_{i=1}^{d}x_{i}w_{ji}+w_{j0}=\sum_{i=0}^{d}x_{i}w_{ji}=w_{j}^{t}x \end{equation}\]Specifically, a signal \(x_{i}\) at the input of synapse \(i\) connected to nueron \(j\) us multiplied by the synaptic weight \(w_{ji}\). \(i\) refers input layer,\(j\) refers hidden layer.\(w_{j0}\) is the bias.\(x_{0}=+1\).
- Each neuron is represented by a set of linear synaptic links, an externally applied bias, and a possibly nonlinear activation link.The bias is represented by a synaptic link connected to an input fixed at \(+1\).
- The synaptic links of a neuron weight their respective input signals.
- The weighted sum of the input signals defines the induced local field of the neuron in question.
- The activation link squashes the induced local field of the neuron to produce an output. Output layer:
\(f()\) is the \emph{activation function}.It defines the output of a neuron in terms of the induced local field \(net\) .
\[\xymatrix { x_{0}=+1 \ar[ddr]|(0.6){w_{j0}} & &\ x_{1} \ar[r]|(0.6){w_{j1}} & B & C\ x_{2} \ar[r]^(0.6){w_{j2}} & net_{j} \ar[r]^(0.6){f()} & y_{j} \ x }\]For example: \(\begin{equation} net_{k}=\sum_{j=1}^{n_{H}}y_{i}w_{kj}+w_{k0}=\sum_{j=0}^{n_{H}}x_{i}w_{ji}=w_{k}^{t}y \end{equation}\) \(n_{H}\)is the number of hidden layers.
So: \(\begin{equation} g_{k}(x)=f(\sum_{j=1}^{n_{H}}w_{kj}+f(\sum_{i=0}^{d}x_{i}w_{ji}+w_{j0})+w_{k0}) \end{equation}\) The activate function of output layer can be different from hidden layer while each unit can have different activate function.
%% BP Algorithm %%
BP Algorithm
The popularity of on-line learning for the supervised training of multilayer perceptrons has been further enhanced by the development of the back-propagation algorithm. Backpropagation, an abbreviation for “backward propagation of errors”,is the easiest way of supervised training.We need to generate output activations of each hidden layer. The partial derivative $\partial J /\partial w_{ji}$ represents a sensitivity factor, determining the direction of search in weight space for the synaptic weight $ w_{ji}$. Learning: \(\begin{gather} \mathcal T =\{ x(n),d(n)\}_{n=1}^{N}\\ e_{j}(n)=d_{j}(n)-y_{j}(n) \end{gather}\) the instantaneous error energy of neuron \(j\) is defined by \(\begin{gather} J(w)=\frac 12 \sum_{k=1}^{c}(e_{k})^{2}=\frac 12||t-\delta||^{2} \\ \end{gather}\) In the batch method of supervised learning, adjustments to the synaptic weights of the multilayer perceptron are performed \emph{after} the presentation of all the \(N\) examples in the training sample \(\mathcal T\) that constitute one \emph{epoch} of training. In other words, the cost function for batch learning is defined by the average error energy \(J(w)\).
- firstly define the training bias of output layer: \(\begin{gather} \Delta w=-\eta\frac {\partial J(w)}{\partial w} \\ w(m+1)=w(m)+/Delta w(m) \end{gather}\) \(\begin{gather} \frac {\partial J}{\partial w_{kj}}=\frac {\partial J}{\partial net_{k}}\frac {\partial net_{k}}{\partial w_{kj}} \\ \frac {\partial J}{\partial net_{k}}= \frac {\partial J}{\partial \delta _{k}}\frac {\partial \delta _{k}}{\partial J}=-(t_{k}-\delta _{k})f'(net_{k}) \\ \Delta w_{kj}=\eta \frac {\partial J}{\partial net_{k}}=\eta (t_{k}-\delta _{k}))f'(net_{k})y_{j} \end{gather}\)
-
input->hidden
本文由 Rowl1ng 创作,采用 知识共享署名4.0 国际许可协议进行许可
本站文章除注明转载/出处外,均为本站原创或翻译,转载前请务必署名
本文会经常更新,最后编辑时间为:2015-12-25 11:12:00