Double Softmax
Intuitively double Softmax could produce a sharper distribution. $$ softmax(softmax(X)) $$ But it's not true if putting a logarithm in between: $$ \begin{align} P = [p_1, p_2, ... p_n] &= softmax(X)\ P_{log} &= \log softmax(X) = \log P\ softmax (P_{log})i &= \frac{e^{\log p_i}}{\sum^n e^{\log p_k}} \ &= \frac{p_i}{\sum_{k=1}^n p_k} \ &= p_i\ softmax(\log softmax(X))&= softmax(P_{log}) = P \&= softmax(X) \end{align} $$
还没有人评论...