A good choice for the criterion is maximum likelihood regularized with dropout, possibly also with weight decay.

1. Sofmax Loss

2. Perceptual Loss


3. Focal loss

看ICCV那篇focal loss的论文《Focal Loss for Dense Object Detection》.


def focal_loss(inputs, targets):
    gamma = 2
    N = inputs.size(0)
    C = inputs.size(1)
    P = F.softmax(inputs) # softmax(x)

    class_mask = inputs.data.new(N, C).fill_(0)
    class_mask = Variable(class_mask)
    ids = targets.view(-1, 1)
    class_mask.scatter_(1, ids, 1.)
    # print(class_mask)

    probs = (P * class_mask).sum(1).view(-1, 1)# softmax(x)_class

    log_p = probs.log()
    # print('probs size= {}'.format(probs.size()))
    # print(probs)

    batch_loss = -(torch.pow((1 - probs), gamma)) * log_p
    # print('-----bacth_loss------')
    # print(batch_loss)

    loss = batch_loss.mean()

    return loss
  • (1D Tensor, Variable) : the scalar factor for this criterion
  • (float, double) : ; reduces the relative loss for well-classified examples (p > .5), putting more focus on hard, misclassified examples
  • size_average(bool): By default, the losses are averaged over observations for each minibatch. However, if the field size_average is set to False, the losses are instead summed for each minibatch.

3.1. Huber Loss

機器/深度學習: 損失函數(loss function)- Huber Loss和 Focal loss

3.2. Tukeys Loss

Robust Optimization for Deep Regression


3.3. 分类交叉熵损失(Categorical Cross-Entropy Loss)

分类交叉熵损失也被称为负对数似然(negative log likelihood)。这是一种用于解决分类问题的流行的损失函数,可用于测量两种概率分布(通常是真实标签和预测标签)之间的相似性。它可用 表示,其中 y 是真实标签的概率分布(通常是一个one-hot vector),是预测标签的概率分布,通常来自于一个 softmax。

3.4. 负对数似然(NLL:Negative Log Likelihood)

参见分类交叉熵损失(Categorical Cross-Entropy Loss)。

3.5. Dice Loss

常用于图像分割任务 Pytorch实现

4. Optimization Procedure

  • a good choice for the optimization algorithm for a feed-forward network is usually stochastic gradient descent with momentum.

5. Loss Function and Conditional Log-Likelihood

  • In the 80’s and 90’s the most commonly used loss function was the squared error

  • if f is unrestricted (non- parametric),

  • Replacing the squared error by an absolute value makes the neural network try to estimate not the conditional expectation but the conditional median

  • 交叉熵(cross entropy)目标函数 : when y is a discrete label, i.e., for classification problems, other loss functions such as the Bernoulli negative log-likelihood4 have been found to be more appropriate than the squared error. ()

  • to be strictly between 0 to 1: use the sigmoid as non-linearity for the output layer(matches well with the binomial negative log-likelihood cost function)

The mean is halved()as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the() term.

Learning a Conditional Probability Model
  • loss function as corresponding to a conditional log-likelihood, i.e., the negative log-likelihood (NLL) cost function

  • example) if y is a continuous random variable and we assume that, given x, it has a Gaussian distribution with mean ${f}_{θ}$(x) and variance ${\sigma}^{2}$

  • minimizing this negative log-likelihood is therefore equivalent to minimizing the squared error loss.

  • for discrete variables, the binomial negative log-likelihood cost func- tion corresponds to the conditional log-likelihood associated with the Bernoulli distribution (also known as cross entropy) with probability of generating y = 1 given x =

results matching ""

    No results matching ""