I tried https://notebooklm.google on two papers of mine. It's advertised as "Your Personalized AI Research Assistant". The short summary is that the tool is exactly as good as an insolent incompetent science journalist. When confronted with its own factual mistakes, it tries to blame the paper instead. (1/3)
The first example was the paper https://arxiv.org/abs/2409.17664 on Comodule representations of second-order functionals, co-authored with Danel Ahman. The AI told me that the paper restricted to representations to only finitely-branching trees. When asked to cite the place in the paper where such a restriction is enforced, it said that finite branching is "strongly implied" by the requirement that the trees must be well-founded. Then I confronted it with the fact that the introduction gives the example of countably branching trees, so clearly the authors did not intend finite branching. The response was that the authors misrepresented their work by giving such an example. When forcilby told that it was wrong, the AI eventually admitted its initial summary of the paper was incorrect. (2/3)
The second examples was the paper https://arxiv.org/abs/2404.01256 on countable reals, co-authored with James Hanson. Here the AI told me that both exacluded middle and the axiom of choice are needed to carry out Cantor's diagonal argument. When I asked whether it meant "and" or "or", it doubled down and claimed the authors of the paper claim both are needed. I asked for the specific quote from the paper, and received one that used the word "or". I pointed out to the AI that that is clearly an "or", and it responded by blaiming the authors for making the mistake of interpreting "or" as an "and". Again it took a couple more iterations to get things straight.
LLMs may be good for some things, but extracting factually correct summaries from scientific papers isn't one of them. (3/3)
@andrejbauer seems like you accidentally copied the same url twice. whats the correct url? i wanna read the paper
@unnick Oops, sorry, I fixed it, and it's https://arxiv.org/abs/2404.01256
@andrejbauer thanks :)
@andrejbauer About a year ago, I asked ChatGPT to summarize a White House Executive Order. I forget what it was. But I gave it the entire name of the Executive Order. Well, approximately 1/4 of ChatGPT's summary was made up. That is, the topics therein did not appear in the Executive Order.
@andrejbauer : I was surprised by your initial statement that it wouldn't take correction, because what I've seen has been, if anything, too eager to be corrected. But that's based on cases where the user flat-out tells the LLM (truthfully or otherwise) that it's wrong. Here you're just trying to lead it to *realize* that it's wrong, and that doesn't work (until you tell it so).
@TobyBartels Nope, I actually told it it was wrong and it didn't buckle. I have the session on the brower in my office, I'll post it tomorrow.
@andrejbauer No ‘I'm sorry for the confusion. The paper is in fact not restricted to finitely-branching trees’? This actually gives me a little more respect for it; in the transcripts I've seen, they'll deferentially believe any nonsense you tell them.
@andrejbauer But even if it was good, why would I speak to it instead of my coworkers or colleagues? Maybe I am just growing old and jaded against AI tools, but I don't see the use cases of most of them.
@antopatriarca Because if it were good you could tell it to read 10000 papers and tell you which ones are relevant for the problem you're trying to solve.
@andrejbauer @antopatriarca I am very concerned about the problem of what happens when time-poor reviewers use this as a tool to help them. My studens have already had to deal with a number of AI generated reviews and the responses from editors/area chairs so far can best be described as crickets...
@andrejbauer Is that a real or hypothetical scenario? Do you really need to read so many papers to see which ones are related to your problem? In my experience the number of people working on the same problem domain are usually quite limited in number and they all know each other. But maybe I'm biased and it depends on the field. The AI field is surely overcrowded right now. Let's assume we really have that many papers. How many of them are really worth publishing? How many of them are actually saying the same thing with different words? Is this tool actually helping with the real problem (IMHO too many papers) or making it worse?
@antopatriarca It's a realistic scenario, except I won't be the one doing the reading. It's essentially a better Google search engine. The AI should read all papers, and then just tell me which ones I should read.
@andrejbauer yes, as a search engine could be useful.
@andrejbauer
> [...] an insolent incompetent science journalist. When confronted with its own factual mistakes, it tries to blame the paper instead
I needed a laugh today, thank you
I fed it my blog, and got pretty much the same conclusion. All facts either incorrect or only approximately correct, lots of explicitly stated stuff ignored.
A friend described the "podcast" as "the verbal equivalent of Muzak".
https://www.someweekendreading.blog/someweekendreading-ai-podcast/
@weekend_editor Except that Muzak is not out of tune, but the podcast is factually incorrect.
I think the most dangerous thing about LLMs is that while they are just high-throughput bluffing machines, they are nonetheless artfully and seductively persuasive.
People *believe* them, because they are masters of rhetoric, not of fact.
Metaphorically, the high-pressure BS firehose is painted pretty colors.