Double Softmax

Intuitively double Softmax could produce a sharper distribution. $$ softmax(softmax(X)) $$ But it's not true if putting a logarithm in between: $$ \begin{align} P = [p_1, p_2, ... p_n] &= softmax(X)\ P_{log} &= \log softmax(X) = \log P\ softmax (P_{log})i &= \frac{e^{\log p_i}}{\sum^n e^{\log p_k}} \ &= \frac{p_i}{\sum_{k=1}^n p_k} \ &= p_i\ softmax(\log softmax(X))&= softmax(P_{log}) = P \&= softmax(X) \end{align} $$


发表评论

这里输入名字

最新评论

    还没有人评论...


Copyright © 2018. [Manage]