mathstodon.xyz is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Mastodon instance for maths people. We have LaTeX rendering in the web interface!

Server stats:

2.9K
active users

I have played a little bit with OpenAI's new iteration of , GPT-o1, which performs an initial reasoning step before running the LLM. It is certainly a more capable tool than previous iterations, though still struggling with the most advanced research mathematical tasks.

Here are some concrete experiments (with a prototype version of the model that I was granted access to). In chatgpt.com/share/2ecd7b73-360 I repeated an experiment from mathstodon.xyz/@tao/1099482491 in which I asked GPT to answer a vaguely worded mathematical query which could be solved by identifying a suitable theorem (Cramer's theorem) from the literature. Previously, GPT was able to mention some relevant concepts but the details were hallucinated nonsense. This time around, Cramer's theorem was identified and a perfectly satisfactory answer was given. (1/3)

In chatgpt.com/share/94152e76-751 I gave the new model a challenging complex analysis problem (which I had previously asked GPT4 to assist in writing up a proof of in chatgpt.com/share/63c5774a-d58 ). Here the results were better than previous models, but still slightly disappointing: the new model could work its way to a correct (and well-written) solution *if* provided a lot of hints and prodding, but did not generate the key conceptual ideas on its own, and did make some non-trivial mistakes. The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, (static simulation of a) graduate student. However, this was an improvement over previous models, whose capability was closer to an actually incompetent (static simulation of a) graduate student. It may only take one or two further iterations of improved capability (and integration with other tools, such as computer algebra packages and proof assistants) until the level of "(static simulation of a) competent graduate student" is reached, at which point I could see this tool being of significant use in research level tasks. (2/3)

[Parenthetical clarifications added - 9/19/2024]

As a third experiment, I asked (in chatgpt.com/share/bb0b1cfa-63f) the new model to begin the task of formalizing a result in (specifically, to establish one form of the prime number theorem as a consequence of another) by breaking it up into sublemmas which it would formalize the statement of, but not the proof. Here, the results were promising in that the model understood the task well and performed a sensible initial breakdown of the problem, but was inhibited by the lack of up-to-date information on Lean and its math library in its training, with its code containing several mistakes. However, I could imagine a model of this capability that was specifically finetuned on Lean and Mathlib, and integrated into an IDE, being extremely useful in formalization projects. (3/3)

Given the interest in these posts, I thought I would share some other minor experiments I had also made with my preview of the model. In 2010 i was looking for the correct terminology for a “multiplicative integral”, but was unable to find it with the search engines of that time. So I asked the question on instead and obtained satisfactory answers from human experts: mathoverflow.net/questions/327

I posed the identical question to my version of and it returned a perfect answer: chatgpt.com/share/66e7153c-b7b . Admittedly, the above MathOverflow post could conceivably have been included in the training data of the model, so this may not necessarily be an accurate evaluation of its semantic search capabilities (in contrast with the first example I shared, which I had mentioned once previously on Mastodon but without fully revealing the answer). Nevertheless it demonstrates that this tool is on par with question and answer sites with respect to high quality answers for at least some semantic search queries. (1/2)

MathOverflowWhat is the standard notation for a multiplicative integral?If $f: [a,b] \to V$ is a (nice) function taking values in a vector space, one can define the definite integral $\int_a^b f(t)\ dt \in V$ as the limit of Riemann sums $\sum_{i=1}^n f(t_i^*) dt_i$, o...
Terence Tao

As another minor experiment, I gave o1 the first half of my recent blog post terrytao.wordpress.com/2024/09 , where I summarized the state of progress on a problem of Erdos that I was able to solve, and asked the model to find the missing ingredient needed to convert the previous partial progress on the problem into a full solution. Here the result was mildly disappointing: chatgpt.com/share/66e718ca-471 . Essentially the model proposed the same strategy that was already identified in the most recent work on the problem (and which I restated in the blog post), but did not offer any creative variants of that strategy. Overall I feel that while there is some ability to randomly generate some creative strategies, this aspect of LLM tools is still rather weak. (2/2)

What's new · Planar point sets with forbidden four-point patterns and few distinct distancesI’ve just uploaded to the arXiv my paper “Planar point sets with forbidden $latex {4}&fg=000000$-point patterns and few distinct distance”. This (very) short paper was a bypro…

@tao what do you think the prospects are for the ai coming up with the partial solution and model problems to solve the general case

@tao I realise this is going to sound slightly snarky, but given the terminology for why I suspect it's happening: is it any more helpful for finding creative strategies than eg taking LSD would be?