mathstodon.xyz is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Mastodon instance for maths people. We have LaTeX rendering in the web interface!

Server stats:

3.1K
active users

Terence Tao

I have been travelling the past few weeks, and have not yet had the time to fully digest the recent announcement by Deepmind of how their two new problem solver engines, AlphaProof and AlphaGeometry2, were able to between them solve 4 of the 6 problems at the most recent International Mathematical Olympiad: deepmind.google/discover/blog/ . But I can record some preliminary impressions.

1. This is great work, shifting once again our expectations of which benchmark challenges are within reach of either -assisted or fully autonomous methods. For instance, IMO level geometry problems are now effectively a solved problem for specialized AI tools; and it seems now that IMO problems that can be readily formalized and with formal proofs that can be located through a reinforcement learning process are now at least somewhat amenable to AI attacks (though currently requiring genuinely significant amounts of compute per problem, and human assistance on the formalization side).

2. There may be side benefits of this approach into making formal mathematics easier to automate, which could in turn facilitate mathematical research methods that contain formal components. In particular, the database of formal proofs generated by this effort could be a useful resource if shared more openly.

3. The approach (based more on reinforcement learning than large language models, somewhat in the spirit of AlphaGo, and heavily emphasizing formal methods) is clever, and makes sense in retrospect. As per the "AI effect" en.wikipedia.org/wiki/AI_effec , once explained, it does not "feel" like an exhibition of human-like intelligence; but it is still an expansion of the capability of our suite of AI-assisted problem solving tools.

Google DeepMindAI achieves silver-medal standard solving International Mathematical Olympiad problemsBreakthrough models AlphaProof and AlphaGeometry 2 solve advanced reasoning problems in mathematics

4. These new tools are not directly comparable with the NuminaMath model that recently won the AIMO progress prize huggingface.co/blog/winning-ai ; that model was fully automated and orders of magnitude more resource efficient, and had a quite different approach (using an LLM to generate Python code to brute force regional competition-level problems with numerical answers). This model is also fully open sourced (and one can play with it for instance at huggingface.co/spaces/AI-MO/ma ). This is also very nice work, and demonstrates the multidimensional nature of the general challenge of trying to use AI to assist or automate different parts of the mathematical problem solving process.

huggingface.coHow NuminaMath Won the 1st AIMO Progress PrizeWe’re on a journey to advance and democratize artificial intelligence through open source and open science.

@tao my understanding is that AlphaGeometry relies heavily on computer algebra systems, so somewhat comparable?