mathstodon.xyz is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Mastodon instance for maths people. We have LaTeX rendering in the web interface!

Server stats:

2.8K
active users

watch a 2 layer neural network learn to separate two classes to the left and right

Matt Henderson

that was with SGD+momentum. Here's what it looks like with the Adam optimizer

@matthen2 That’s supercool! Makes you appreciate Adam haha!

@matthen2 What does the y dimension represent? Great viz,

@matthen2 is the training loss as shaky as the point movement, or is it smooth as silk? How does the batch size compare to the dataset site? Are these points train or held out?
To many questions, but very cool visualisation idea!

@lb loss is also jittery, but I do use smaller random procedurally sampled batches. In picking hyperparams I optimized a bit for producing a nice animation versus learning efficiently

@matthen2 thanks.

The real reason for my questions is that I was wondering whether this or similar could be used to gain intuition both about the effect of architecture components as well as of optimizer components.

@lb definitely! The type of non linearity also shows interesting differences in the animation. This is relu

@matthen2 comparing different norm layers would be very interesting. Though unclear how to factor out the effect of lr.

Do you plan on publishing the code at some point? Want to know if I need to start working on repro or can just wait :-)

@matthen2 that's fun, it's like it's being stretched under extreme tension

@matthen2 Hi Matt! You probably get asked this a lot, but can you share some of the code for this animation?