为什么使用softmax,不用normalization?
“max” because amplifies probability of largest
“soft” because still assigns some probability to smaller
为什么使用softmax,不用normalization?
“max” because amplifies probability of largest
“soft” because still assigns some probability to smaller