# 1. Attention Mechanism

## 1.1. Neural Turing Machines

Instead of specifying a single location, the RNN gives “attention distribution” which describe how we spread out the amount we care about different memory positions. As such, the result of the read operation is a weighted sum.

## 1.2. Attentional Interfaces

We’d like attention to be differentiable, so that we can learn where to focus. To do this, we use the same trick Neural Turing Machines use: we focus everywhere, just to different extents.

Attention can also be used on the interface between a convolutional neural network and an RNN. This allows the RNN to look at different position of an image every step. One popular use of this kind of attention is for image captioning. First, a conv net processes the image, extracting high-level features. Then an RNN runs, generating a description of the image. As it generates each word in the description, the RNN focuses on the conv nets interpretation of the relevant parts of the image. We can explicitly visualize this:

Attention是选择显著区域的人眼视觉过程，这方面的算法模型注重给出fixation-prediction，典型的有Itti和Koch 1998年给出的视觉注意机制模型，attention主要是从人眼产生注意的过程和视觉系统的特征研究算法模型。

## 1.3. 应用

### 1.3.1. 1. Machine Translation

The decoder decides parts of the source sentence to pay attention to. By letting the decoder have an attention mechanism, we relieve the encoder from the burden of having to encode all information in the source sentence into a fixedlength vector.

### 1.3.2. Motivation

soft-align：在进行翻译时，虽然语序会变，但大体的语义部分是有对应关系的，如果能找到这个关系，就不用整句整句encode。

### 1.3.3. Approach

• 对齐（Align）与翻译同时进行。

### 1.3.4. 2. Image Caption

《Show, Attend and Tell: Neural Image Caption Generation with Visual Attention* 》

Attention can also be used on the interface between a convolutional neural network and an RNN. This allows the RNN to look at different position of an image every step. One popular use of this kind of attention is for image captioning. First, a conv net processes the image, extracting high-level features. Then an RNN runs, generating a description of the image. As it generates each word in the description, the RNN focuses on the conv nets interpretation of the relevant parts of the image. We can explicitly visualize this:

• attention主要应用在sequence to sequence中。而这里还是解决分类问题，输出不是sequence而是vector

## 1.4. Encoder : Bidirectional RNN (BiRNN) for Annotation

• concate: forward+backward :

• 输出：[time][batch][cell_fw.output_size + cell_bw.output_size]

# Forward direction cell

lstm_fw\_cell = tf.nn.rnn_cell.BasicLSTMCell(num_hidden, forget_bias=0.0)

# Backward direction cell

lstm_bw\_cell = tf.nn.rnn\_cell.BasicLSTMCell$$num\_hidden, forget\_bias=0.0$$
pre_encoder\_inputs, output\_state\_fw, output\_state\_bw = tf.nn.bidirectional\_rnn$$lstm\_fw\_cell, lstm\_bw\_cell, lstm_inputs, initial_state_fw=None, initial_state\_bw=None,dtype=tf.float32, sequence\_length=None, scope=None$$

encoder\_inputs = $e\*f for e, f in zip$$pre\_encoder\_inputs, encoder\_masks\[:seq\_length$$$\]

• initial_states : 2D batch_size X cell.state_size
initial\_state = concate$$output\_state\_fw, output\_state\_bw$$

\#-&gt; 2 x num\_hidden


## 1.5. Decoder :

• memory state
• hidden state

• ：前一时刻隐藏层状态

• 隐藏层像是一个混沌，记忆、遗忘皆源于此，几乎时刻变化着。

• 这里时间$t$的概念也是伴随着一个个小方格源源不断地输入。

• 之前输出的结果们

• : 当前输入的context vector

• 这里应该对应于当前（时刻）输入的方格相关（relevant）的方格状况。

• : memory state

• : hidden state

### 1.5.1. seq2seq

• attn_num

• attn_layer


single\_cell = tf.nn.rnn\_cell.BasicLSTMCell$$attn\_num\_hidden, forget\_bias=0.0$$

cell = tf.nn.rnn\_cell.MultiRNNCell$$$single\_cell$ \* attn\_num\_layers$$

num\_hidden = attn\_num\_layers \* attn\_num\_hidden


• forward_only 训练时还要backward，测试时则不用

In this context "attention" means that, during decoding, the RNN can look up

information in the additional tensor attention_states, and it does this by

focusing on a few entries from the tensor. This model has proven to yield

especially good results in a number of sequence-to-sequence tasks. This

implementation is based on http://arxiv.org/abs/1412.7449 (see below for

details). It is recommended for complex sequence-to-sequence tasks.