How to learn word2vec

博客分类: tech 阅读次数:

How to learn word2vec

Choose your language :

1. NNLM

这里看的是Bengio的论文

Bengio et al. \cite{nnlm} first proposed a Neural Network Language Model (NNLM) that simultaneously learns a word embedding and a language model.The language model utilizes several previous words to predict the distribution of the next word.For each sample in the corpus ,we maximize the log-likelihood of the probability of the last word given the previous words.This model uses a concatenation of the previous words’ embeddings as the input.The model structure is a feed-forward neural network with one hidden layer.

2. LBL

The Log-Bilinear Language Model(LBL) proposed by Mnih and Hinton combines Bengio’s Hierachical NNLM and Log Bi-Linear.It uses a log-bilinear energy function that is almost equal to that of the NNLM and removes the non-linear activation function tanh.

A previous study \cite{lbl} proposed a widely used model architecture for estimating neural network language model.

NER

Multi-Layer Neural Network

A. 3-layer network: Input Layer,Hidden Lyer,Output layer. Except input units,each unit has a bias.

preassumption calculation

Specifically, a signal at the input of synapse connected to nueron us multiplied by the synaptic weight . refers input layer, refers hidden layer. is the bias..

is the \emph{activation function}.It defines the output of a neuron in terms of the induced local field .

For example: is the number of hidden layers.

So: The activate function of output layer can be different from hidden layer while each unit can have different activate function.

%% BP Algorithm %%

BP Algorithm

The popularity of on-line learning for the supervised training of multilayer perceptrons has been further enhanced by the development of the back-propagation algorithm. Backpropagation, an abbreviation for “backward propagation of errors”,is the easiest way of supervised training.We need to generate output activations of each hidden layer. The partial derivative $\partial J /\partial w_{ji}$ represents a sensitivity factor, determining the direction of search in weight space for the synaptic weight $ w_{ji}$. Learning: the instantaneous error energy of neuron is defined by In the batch method of supervised learning, adjustments to the synaptic weights of the multilayer perceptron are performed \emph{after} the presentation of all the examples in the training sample that constitute one \emph{epoch} of training. In other words, the cost function for batch learning is defined by the average error energy .