the next section we will introduce the sigmoid and **softmax** tools for classiﬁ- ... training examples. We will introduce the cross-entropy **loss function**. 4.An algorithm for optimizing the objective **function**. We introduce the stochas-tic gradient descent algorithm. **Logistic regression** has two phases: training: we train the system (speciﬁcally the weights w and b) using. **Softmax** **Loss**, Negative Logarithmic Likelihood, NLL ¶. Cross Entropy **Loss** same as Log **Softmax** + NULL Probability of each class. f ( s, y ^) = − ∑ c = 1 M y ^ c l o g ( s c) y ^ is 1*M vector, the value of true class is 1, other value is 0, hence. f ( s, y ^) = − ∑ c = 1 M y ^ c l o g ( s c) = − l o g ( s c). The **softmax function** transforms a vector K of real values into a vector K whose elements range between 0 and 1 and sum up to 1. This **function** is also called softargmax or multi-cast logistic **regression**. The advantage of applying this **function** is that the transformed vector values can be interpreted as a probability and, if an input is negative.