Sigmoid and softmax learn an equivalent classifier.

Let $\sigma(wx + b) = 0.5$ be the decision boundary for a sigmoid classifier.

Then, for a softmax, $\exp(w_1 x + b_1) / [(\exp(w_1x + b_1) + \exp(w_2x + b_2)] = 0.5$ implies $\exp(w_1x + b_1) = \exp(w_2x + b_2)$ implies $w_1x + b_1 = w_2x + b_2$ implies $(w_1 - w_2)x + (b_1 - b_2) = 0.$

