Confusing loss function and activation function in Pytorch [softmax, log_softmax, NLLLoss, CrossEntropy]

Article directory

  • definition
    • activation function
      • softmax
      • T-softmax
      • log_softmax
    • loss function
    • NLL Loss
    • CrossEntropy cross entropy

Definition

  • softmax: maps a sequence of values to a probability space (each element is distributed and all sums are 1)
  • log_softmax: Take the logarithm on the basis of softmax
  • NLLLoss: Calculate log_softmax and one-hot
  • CrossEntropy: Measures the difference between two probability distributions (cross entropy)

In classification problems, CrossEntropy is equivalent to log_softmax combined with nll_loss.

Activation function

softmax

import torch
import torch.nn.functional as F
'''
torch.nn.functional involves all the classes and methods that torch.nn needs. The modules built by torch.nn are usually implemented by calling the methods in torch.nn.functional.
'''
torch.manual_seed(0)
output = torch.randn(2, 3)
print(output)
#tensor([[ 1.5410, -0.2934, -2.1788],
# [ 0.5684, -1.0845, -1.3986]])
print(F.softmax(output, dim=1))
# here dim means to calculate the dimension of Softmax, set dim=1 here, you can see that the sum of each row is 1. 0 is for the column and 1 is for the row
#tensor([[0.8446, 0.1349, 0.0205],
# [0.7511, 0.1438, 0.1051]])

T-softmax

The purpose of T-softmax is to smooth the distribution and not make the distribution too extreme. For example, you can see the following example.

import numpy as np

def softmax(x):
    x_exp = np. exp(x)
    return x_exp / np. sum(x_exp)

output = np.array([0.1, 1.6, 3.6])
print(softmax(output))
#[0.02590865 0.11611453 0.85797681]

Use a softmax function with a temperature coefficient:

def softmax_t(x, t):
    x_exp = np. exp(x / t)
    return x_exp / np. sum(x_exp)

output = np.array([0.1, 1.6, 3.6])
print(softmax_t(output, 5))
#[0.22916797 0.3093444 0.46148762]

Set it to 5 to see that the numbers distributed in [0, 1] are smoother.

log_softmax

This is easy to understand. In fact, it is to perform a logarithmic operation on the result after softmax processing. Can be understood as log(softmax(output))

print(F.log_softmax(output, dim=1))
print(torch.log(F.softmax(output, dim=1)))
# The output is consistent

tensor([[-0.1689, -2.0033, -3.8886], [-0.2862, -1.9392, -2.2532]]) tensor([[-0.1689, -2.0033, -3.8886], [-0.2862, -1.9392, -2.2532] ])

Loss function

NLLLoss

The full name of this function is negative log likelihood loss. If

x

i

=

[

q

1

,

q

2

,

.

.

.

,

q

N

]

x_i=[q_1, q_2, …, q_N]

xi?=[q1?,q2?,…,qN?] is the output value of the neural network for the i-th sample,

the y

i

y_i

yi? is the real label. but:

f

(

x

i

,

the y

i

)

=

?

q

the y

i

f(x_i,y_i)=-q_{y_i}

f(xi?,yi?)=?qyi
Enter: log_softmax(output), target

print(F.nll_loss(torch.tensor([[-1.2, -2, -3]]), torch.tensor([0])))
#The result is tensor(1.2000) is to take the negative number of the 0th index value

Usually we use log_softmax and nll_loss together.

CrossEntropy

In classification problems, CrossEntropy is equivalent to log_softmax combined with nll_loss

N

N

For N classification problems, for a specific sample, its true label is known, and the calculation formula of CrossEntropy is:

c

r

o

the s

the s

_

e

no

t

r

o

p

the y

=

?

k

=

1

N

(

p

k

?

log

?

q

k

)

cross\_entropy=-\sum_{k=1}^{N}\left(p_{k} * \log q_{k}\right)

cross_entropy=?k=1∑N?(pklogqk?)

Among them, p represents the real value, which is a one-hot form in this formula; q is the result after softmax calculation,

q

k

q_k

qk? is the neural network thinks the sample is the first

k

k

The probability of class k.

Careful observation shows that because the element of p is either 0 or 1, and it is multiplication, so naturally if we know the index corresponding to 1, then we don’t need to do other meaningless operations. So in the pytorch code, the target is not expressed in one-hot form, but directly expressed in scalar. If the real label of the sample is

the y

the y

y, the formula of cross entropy can be transformed into:

c

r

o

the s

the s

_

e

no

t

r

o

p

the y

=

?

k

=

1

N

(

p

k

?

log

?

q

k

)

=

?

l

o

g
?

q

the y

cross\_entropy=-\sum_{k=1}^{N}\left(p_{k} * \log q_{k}\right)=-log \, q_{y}

cross_entropy=?k=1∑N?(pklogqk?)=?logqy?

output = torch.tensor([[1.2, 2, 3]])
target = torch. tensor([0])

log_sm_output = F.log_softmax(output, dim=1)
nll_loss_of_log_sm_output = F.nll_loss(log_sm_output, target)
print(nll_loss_of_log_sm_output)
output = torch.tensor([[1.2, 2, 3]])
target = torch. tensor([0])

ce_loss = F.cross_entropy(output, target)
print(ce_loss)

F.cross_entropy 《==》 F.log_softmax(output, dim=1) + F.nll_loss(log_sm_output, target)

The two are equivalent.