mathstodon.xyz is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Mastodon instance for maths people. We have LaTeX rendering in the web interface!

Server stats:

2.9K
active users

Matt Henderson

watch a 2 layer neural network learn to separate two classes to the left and right

that was with SGD+momentum. Here's what it looks like with the Adam optimizer

and feature engineering can be important, even for neural networks! Here we feed the points as polar coordinates, rather than as (x, y), and it learns much faster

@AlanZucconi so jittery! and this is after exponential moving average

@matthen2 That’s supercool! Makes you appreciate Adam haha!

@matthen2 What does the y dimension represent? Great viz,

@matthen2 is the training loss as shaky as the point movement, or is it smooth as silk? How does the batch size compare to the dataset site? Are these points train or held out?
To many questions, but very cool visualisation idea!

@lb loss is also jittery, but I do use smaller random procedurally sampled batches. In picking hyperparams I optimized a bit for producing a nice animation versus learning efficiently

@matthen2 thanks.

The real reason for my questions is that I was wondering whether this or similar could be used to gain intuition both about the effect of architecture components as well as of optimizer components.

@lb definitely! The type of non linearity also shows interesting differences in the animation. This is relu

@matthen2 comparing different norm layers would be very interesting. Though unclear how to factor out the effect of lr.

Do you plan on publishing the code at some point? Want to know if I need to start working on repro or can just wait :-)

@matthen2 that's fun, it's like it's being stretched under extreme tension

@matthen2 Hi Matt! You probably get asked this a lot, but can you share some of the code for this animation?

@matthen2 Way to shake out that nonlinearity!

@matthen2 do you have the code that generated this?

@matthen2 very funky. What are the x and y axes here?

@matthen2 If you want a challenge: Use ML to separate the blood cell types from flow cytometry data. I worked on the electronics years ago, and they had problems doing that with handmade algorithms. One of the companies might be able to supply sample data. The medical/animal labs handle a lot of that.

@matthen2 What if you feed it both? Is it able to leverage that even better? Or maybe the model gets too big and it outweighs the benefit?

@matthen2 A support vector machine with exponential kernel, say, should be very fast learning this pattern, no?

@mweiss it could learn the middle disk with one support, but would need more for the annular disks

@matthen2 I see, so exponential plus polynomial (quadratic would suffice) to get the nonlinearity. It's been a while but many general kernel should be that flexible.

@matthen2 can the network separate a spiral?