1. Optimization Procedure

A good choice for the criterion is maximum likelihood regularized with dropout, possibly also with weight decay.

A good choice for the optimization algorithm for a feed-forward network is usually stochastic gradient descent with momentum.

2. Loss Function and Conditional Log-Likelihood

In the 80’s and 90’s the most commonly used loss function was the squared error

if f is unrestricted (non-parametric),

Replacing the squared error by an absolute value makes the neural network try to estimate not the conditional expectation but the conditional median

分类交叉熵损失(Categorical Cross-Entropy Loss)。

交叉熵(cross entropy)目标函数 : when y is a discrete label, i.e., for classification problems, other loss functions such as the Bernoulli negative log-likelihood have been found to be more appropriate than the squared error. ()

to be strictly between 0 to 1: use the sigmoid as non-linearity for the output layer(matches well with the binomial negative log-likelihood cost function)

The mean is halved()as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the() term.

3. Learning a Conditional Probability Model

负对数似然(NLL:Negative Log Likelihood)

loss function as corresponding to a conditional log-likelihood, i.e., the negative log-likelihood (NLL) cost function

Example: if y is a continuous random variable and we assume that, given , it has a Gaussian distribution with mean and variance

Minimizing this negative log-likelihood is therefore equivalent to minimizing the squared error loss.

For discrete variables, the binomial negative log-likelihood cost function corresponds to the conditional log-likelihood associated with the Bernoulli distribution (also known as cross entropy) with probability of generating given

3.1. 分类交叉熵损失(Categorical Cross-Entropy Loss)

分类交叉熵损失也被称为负对数似然(negative log likelihood)。这是一种用于解决分类问题的流行的损失函数,可用于测量两种概率分布(通常是真实标签和预测标签)之间的相似性。它可用 表示,其中 y 是真实标签的概率分布(通常是一个one-hot vector),是预测标签的概率分布,通常来自于一个 softmax。

3.2. Tukeys Loss

Robust Optimization for Deep Regression


3.3. Dice Loss

常用于图像分割任务 Pytorch实现

4. Perceptual Loss


5. Focal loss

看ICCV那篇focal loss的论文《Focal Loss for Dense Object Detection》.


def focal_loss(inputs, targets):
    gamma = 2
    N = inputs.size(0)
    C = inputs.size(1)
    P = F.softmax(inputs) # softmax(x)

    class_mask = inputs.data.new(N, C).fill_(0)
    class_mask = Variable(class_mask)
    ids = targets.view(-1, 1)
    class_mask.scatter_(1, ids, 1.)
    # print(class_mask)

    probs = (P * class_mask).sum(1).view(-1, 1)# softmax(x)_class

    log_p = probs.log()
    # print('probs size= {}'.format(probs.size()))
    # print(probs)

    batch_loss = -(torch.pow((1 - probs), gamma)) * log_p
    # print('-----bacth_loss------')
    # print(batch_loss)

    loss = batch_loss.mean()

    return loss
  • (1D Tensor, Variable) : the scalar factor for this criterion
  • (float, double) : ; reduces the relative loss for well-classified examples (p > .5), putting more focus on hard, misclassified examples
  • size_average(bool): By default, the losses are averaged over observations for each minibatch. However, if the field size_average is set to False, the losses are instead summed for each minibatch.

5.1. Huber Loss

機器/深度學習: 損失函數(loss function)- Huber Loss和 Focal loss

results matching ""

    No results matching ""