cs231n的第一次作業(yè)Softmax

2019-11-08 18:43:43

字體：大中小

來源：轉(zhuǎn)載

供稿：網(wǎng)友

Softmax分類器

損失函數(shù)

softmax的損失函數(shù)為這里寫圖片描述這里log的底數(shù)為e，即等價(jià)于這里將最后得到的score歸一化了。

這位大神說的好

SVM只選自己喜歡的男神，Softmax把所有備胎全部拉出來評分，最后還歸一化一下。

損失函數(shù)求導(dǎo)

對于softmax損失函數(shù)的求導(dǎo)具體可以參考ufldl的Softmax回歸，很詳細(xì)。自己理了下，首先 f(Xi)j = Wj * Xi，即 fj = Wj*Xi。對W求偏導(dǎo)，對于正確類別的W分類器dWyi 這里寫圖片描述對于dWj

關(guān)于數(shù)值穩(wěn)定

知乎翻譯的notes里說，編程實(shí)現(xiàn)的時候數(shù)值可能非常大，做除法不穩(wěn)定。這里有一個trick，即分子分母同乘以一個數(shù)C，參考ufldl里Softmax回歸模型參數(shù)化的特點(diǎn) 所驗(yàn)證的，一般把C取為 logC = maxXj*fj 。

代碼形式如下

def softmax_loss_naive(W, X, y, reg): """ Softmax loss function, naive implementation (with loops) Inputs have dimension D, there are C classes, and we Operate on minibatches of N examples. Inputs: - W: A numpy array of shape (D, C) containing weights. - X: A numpy array of shape (N, D) containing a minibatch of data. - y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C. - reg: (float) regularization strength Returns a tuple of: - loss as single float - gradient with respect to weights W; an array of same shape as W """ # Initialize the loss and gradient to zero. loss = 0.0 dW = np.zeros_like(W) ############################################################################# # TODO: Compute the softmax loss and its gradient using explicit loops. # # Store the loss in loss and the gradient in dW. If you are not careful # # here, it is easy to run into numeric instability. Don't forget the # # regularization! # ############################################################################# #pass # Get shapes num_classes = W.shape[1] num_train = X.shape[0] for i in xrange(num_train): scores = X[i].dot(W) shift_scores = scores - max(scores) loss_i = - shift_scores[y[i]] + np.log(sum(np.exp(shift_scores))) loss += loss_i for j in xrange(num_classes): softmax_output = np.exp(shift_scores[j]) / sum(np.exp(shift_scores)) if j == y[i]: dW[:, j] += (-1 + softmax_output) * X[i] else: dW[:, j] += softmax_output * X[i] loss /= num_train loss += 0.5 * reg * np.sum(W * W) dW = dW / num_train + reg * W return loss, dW

num_classes = W.shape[1] num_train = X.shape[0] 這兩個數(shù)的取值注意下，，容易錯

關(guān)于問題Why do we expect our loss to be close to -log(0.1)? Explain briefly. 這位大神做出了解釋 Since the weight matrix W is uniform randomly selected, the PRedicted probability of each class is uniform distribution and identically equals 1/10, where 10 is the number of classes. So the cross entroy for each example is -log(0.1), which should equal to the loss. 因?yàn)榈螖?shù)為1，W是隨便取的，W接近于0，損失函數(shù)分子分母同時近似約去Wj（因?yàn)閒j 遠(yuǎn)大于Wj），所以近似于1/10。最后loss接近于-log(0.1)(這是迭代一次的結(jié)果)。

之后的驗(yàn)證和svm差不多，最后得到的準(zhǔn)確率為

softmax on raw pixels final test set accuracy: 0.334000

參考

http://ufldl.stanford.edu/wiki/index.php/Softmax%E5%9B%9E%E5%BD%92 https://zhuanlan.zhihu.com/p/21102293?refer=intelligentunit http://www.cnblogs.com/wangxiu/p/5669348.html https://github.com/lightaime/cs231n/tree/master/assignment1 http://cs231n.github.io/linear-classify/#softmax

上一篇：IOC模式精簡結(jié)構(gòu)-demo

下一篇：LintCode on Array by Odd and Even