𝗤𝗨𝗜𝗖𝗞! 𝗙𝗘𝗘𝗗 𝗧𝗛𝗘 𝗕𝗘𝗔𝗦𝗧!
Academic publisher Taylor & Francis recently sold many of its authors’ works to Microsoft for $10 million, without asking or paying the authors — to train Microsoft’s large language models!
Taylor & Francis asked their journal "Learning, Media and Technology" to cut peer review time to 15 days — absurdly little time — to crank out more content.
And Taylor & Francis's subsidiary Routledge told staff that it was “extra important” to meet publishing targets for 2024. It moved some book deadlines from 2025 to 2024. Why? To meet its deadline with Microsoft.
Another academic publisher, Wiley, made a $44 million deal to feed academic books to LLMs — with no way for authors to opt out. They say “it is in the public interest for these emerging technologies to be trained on high-quality, reliable information.”
When you publish with one of the big academic publishers, they try to make you sign a contract saying they can do whatever they want with your work. That means anything.
Hat-tip to @bstacey for pointing this out.
These articles have links to original sources:
https://pivot-to-ai.com/2024/08/04/more-academic-publishers-are-doing-ai-deals/
In a different universe, academics around the world had enough brain power to invent and alternative system to the broken parasitic one we have now.
In a different universe, academics could get out of their moderately comfy office armchairs and organise collectively.
The academic publishers would crumble. Without content they are nothing.
@rzeta0 @johncarlosbaez @bstacey I feel ya. I've become incredibly cynical about academia in my seven years of higher ed - even outside of publishing, the way unis are basically hedge funds and landlords that sometimes give degrees, the sheer sinisterness of the grant system and research funding, the funneling of students through whatever terrible circumstances just to keep graduation rates up, the postdoc system generally, the bureaucratic bloat enriching administrators at the expense of professors and students, the insular departmental cultures and elitist attitude against "defecting" from the grind as though we're somehow above the fray of human affairs rather than mired in them... There *has* to be a better way, but frankly I'm tired and have gone through enough and would rather find ways to stay mathematically engaged outside of the ivory towers.
@djspacewhale - The way I see it, the key problem is that academics have preferred to focus on their research and let administrators and publishers do the boring work. It's not a lack of brain power, it's laziness. But this means that gradually academics have become trapped in a system that doesn't work for them.
I see this clearly in the University of California system. This has a lot of self-governance baked into its rules - that is, academics are supposed to be making decisions about how to run things. But we've let administrators take over, not to mention publishers.
My personal solution was to retire early. This is not a solution to the overall problem!
My grad student Brendan Fong has set up his own institute, the Topos Institute. That's a better solution. Take charge!
@johncarlosbaez - I think saying it is laziness is a little harsh. Division of work with appropriately skilled people is not being lazy. The system has become parasitic to the point that it is killing its host; and if removing parasites were just a matter of not being lazy, there’d be very few parasites anywhere.
I think our real failing (as a society) was allowing these companies to financially profit to the point where they could impose and protect their parasitism on us (those that need to publish) with impunity, rather than directly taking over.
I continue to hope that modern tech can help by reducing the cost and inconvenience of the "boring work," as you call it. Only a few years ago, copyediting for anything I wanted to publish was a costly and time-consuming business as I’m profoundly dyslexic. Now I use an LLM to reduce this cost to effectively zero, allowing me to start self-publishing articles. These companies know they’re doomed in their current form, and this seems like a last misguided grasp to gouge more money from the system before they evaporate.
I (and some others) also started our own institute, though alas, it did not end too well for us!
@rzeta0 @johncarlosbaez @bstacey
As much as I share your hope, In this universe, you get sidelined on a *diversity* panel (sic!) meeting for suggesting a motion for the institution to leave the shitty birdplace and to encourage alternatives for professional communictions of faculty and staff.
Harshest reactions come from younger students.
I don't see the light at the end of this tunnel yet.
@rzeta0 @johncarlosbaez @bstacey Caramba! That's the universe you're actually living in!
Check out the demo we presented at Fediforum a few weeks ago: https://spectra.video/w/q3FV8rG7st5XdvahboGxzp
Learn more about it here: https://openscience.network/about and https://bonfirenetworks.org/posts/openscience_network/
@FourOh-LLC @johncarlosbaez what does this have to do with socialism
@FourOh-LLC @djspacewhale @johncarlosbaez
Sure, but nobody is going to go live in Socialism (nor should they have to just to understand what you're hinting at), so either share your experiences, or your words are completely pointless.
@FourOh-LLC @djspacewhale @johncarlosbaez
Thanks for sharing that.
Anger at Socialism and Socialists is understandable, but there's no need to be angry at anyone on Mastodon, at least in this thread; no one here is proposing Socialism, although that's what publishers and others are doing.
I'm aware that the USSR sucked; aside from news, I've met and have been friends with many Russian emigrants from the ex-USSR and one in particular from Hungary (which of course continues to have problems, alas).
Congratulations on escaping.
@johncarlosbaez @bstacey hey @jonny ! I was wondering what your thoughts are on this. Are we doomed to all published work being sent into the machine?
@science_is_hard @johncarlosbaez @bstacey @jonny as of today, yes. We have to push change ourselves if we want to see a different outcome.
@johncarlosbaez @bstacey on the other side, „researchers“ are cranking out content with the aid of LLMs. How long until all of this falls apart?
@johncarlosbaez @bstacey Crap. I have stuff I've authored through Routledge. How am I just now hearing about this?
I don't see the problem on the source side, since it's just a more automated version of what we already had. #Science is not a closed endeavor.
Now, since they are selling this for commercial use, we have a problem of another kind. How will this info be used? Who will have (free) access to it?
The simple fact of the matter is, that we need something like this, and it needs to be done collectively such as we see in the Olympics, Space stations, or Global Monetary infrastructure. Or even-- wait for it.. the Interfriggenet.
The costs & benefits need to be adequately distributed to all of humanity, through a downward cascading set of densely connected nodes. (Review, critique, and citation by reputable sources.)
We do not need, planet survival notwithstanding, to have a circle-jerk of multinational corporate pirates competing to serve up supercomputer power to every conceivable trivial search made by the masses , made possible by the textbook 'freemium' scam that treats energy as freemium too.
@MalthusJohn wrote: "I don't see the problem on the source side, since it's just a more automated version of what we already had. #Science is not a closed endeavor."
By the way, I'm talking about scholarship, which includes many subjects besides just "science". But that's not the main point.
In scholarship researchers put work into figuring things out. They write papers and books. When people use the results in those papers and books, they cite that work so we can follow the citation, go back, and find out what's the evidence for those claims. And when authors are cited, they get rewarded in various ways.
When you take people's papers, put them into a blender, and produce a LLM that spits out claims based on an undifferentiated mix of stuff, it breaks the system in at least two ways. Users of the LLM can't easily find where these claims are coming from. And the scholars who did the research aren't getting rewarded by citations.
To fix this, we need LLMs that cite what they're basing their claims on.
And we need to fix the publishing model so that universities, whose scholars do the research, don't have to pay monopoly prices to access the results. It would be even better if ordinary citizens could freely access the research their taxes are paying for!
The main problem is that the publishers' goals are not aligned with those of the people who read and write the publications. This leads to outrages like Routledge telling people to hurry up and finish writing a book so they can feed it to a large language model.
@bstacey
@johncarlosbaez @MalthusJohn
@bstacey
I have good hopes that so-called "diamond open access" journal will take over in less than a deacade or so. See already initiatives like https://tektonika.online and https://seismica.library.mcgill.ca but there are many others in many disciplines. By the researchers for everyone.
This of course does not prevent LLM to digest these papers. But at least no "editor" makes *more* money for this in our back...
We agree on the need for a better system, where goals, risks, and benefits are aligned.
It needs to produce a couple orders of magnitude more quantity as well.
I've also said in other chats that to use LLM well, you need to know the citations where the info comes from. There are no original works possible without human input.
The reason there's no problem on the publishing side is that whatever is being 'scraped' is already copyrighted, peer reviewed, etc. Using these aids does not eliminate the scientific methodology, norms and such. When anyone, using AI or not, submits a paper, it has to pass the same standards.
If we leave bad actors aside, if an idea has precedence elsewhere in the literature, the author's ignorance is no excuse. Most of the trash output of AI should be caught in early filters, among the tests for that must include having citations of previous legit work. A paper that claims groundbreaking new work and has no citations is not going to get traction, for example. Certainly not without the accompanying logic, the necessary input of a human who lays out the trail for others to follow, and confirm.
@MalthusJohn @johncarlosbaez ai is unnecessary. It's the tool of the greedy. Not the tool of knowledge.
Yes, I would much rather see an army of people employed to work on updating the body of science literature than these LLM centers.
We were talking about something that already has happened & will continue to do so for the usual financier cult driven reasons.
My point was simply that the AI output is never going to be published as it is produced because it has either already been published (an exact copy) or because it's gibberish (a statistical copy). They are pretty much programmed to avoid the former for copyright reasons, leaving only the crap output to worry about filtering out.
An even simpler example, I once wrote a bit of open-source code to compute a 'modular inverse', with comments referencing a paper and a website to read up on limitations of my implementation and faster options.
Out of curiosity, I had GitHub's Co-Pilot write the same function for me. It wrote the exact same code, minus all the comments to help users go learn more.
Citations matter. But that's sometimes counter to the financial incentives of the people who rent out time on LLMs, as they may have to flow some of that money (and attention) back to the people who created the training data.
@johncarlosbaez @MalthusJohn @bstacey there is no need to fix anything. The system worked well without AI crap. To fix this we have to remove LLMs, which are always unnecessary, and always motivated by financial reasons to milk out all authors and artists.
@johncarlosbaez This kind of shit doesn't just enrage me, it also makes me deeply sad. I want to publish textbooks one day, and I want them to be actual Books rather than just lectures notes available free on my website, but who will I be able to entrust my work to?
@johncarlosbaez @bstacey This story enrages me. But the fact that the artwork that accompanies it was made from also pilfering artists' work to feed an AI is beyond ironic.
@LingLass - that was intentional; read the image information.
@johncarlosbaez truths for the lie machine! money for the money throne!
@johncarlosbaez That's just modern development. After couple of years Microsoft don't need human being customers at all.
@Santtu_61 @johncarlosbaez That would appear to be an AI illustration accompanying this post.
If I’m wrong please credit the artist. If I’m right, please consider that irony may be dead…
@megmuttonhead @Santtu_61 @johncarlosbaez Definitely hit me too. Let's complain about stealing from authors while stealing from artists.
@ocdtrekkie @Santtu_61 @johncarlosbaez apparently, I was mistaken. (My allergy to AI images is possibly being generalized to all images… Gotta watch out for that.)
@megmuttonhead @Santtu_61 @johncarlosbaez What Cat C-B said : I’m getting mixed messages from the AI garbage pic.
@Wlm @megmuttonhead @Santtu_61 - irony is not dead, you just didn't read the image description.
@johncarlosbaez @Wlm @Santtu_61 Thank you for the correction! I appreciate it.
@johncarlosbaez using an AI-generated image on a post critical of LLMs is definitely A Look.
a) you did sign away your copyright
b) is it not better to train LLM on proper academic papers rather than random rants on Twitter and Reddit?
@richardtol @johncarlosbaez @bstacey I'd argue it is even better to not train LLMs at all
@richardtol wrote:
"you did sign away your copyright"
No, I didn't - I avoid publishing with crap publishers who force bad contracts on their authors. But a bunch of suckers did. So I'm warning others.
"is it not better to train LLM on proper academic papers rather than random rants on Twitter and Reddit?"
Better for whom? You're not addressing the key problem here, which is that crap publishers are now trying to make people finish their books fast, and skip proper refereeing. That's not better for anyone except the publishers.
In scholarship researchers put work into figuring things out. They write papers and books. When people use the results in those papers and books, they cite that work so we can follow the citation, go back, and find out what's the evidence for those claims. And when authors are cited, they get rewarded in various ways.
When you take people's papers, put them into a blender, and produce a LLM that spits out claims based on an undifferentiated mix of stuff, it breaks the system in at least two ways. Users of the LLM can't easily find where these claims are coming from. And the scholars who did the research aren't getting rewarded by citations.
To fix this, we need LLMs that cite what they're basing their claims on.
And we need to fix the publishing model so that universities, whose scholars do the research, don't have to pay monopoly prices to access the results. It would be even better if ordinary citizens could freely access the research their taxes are paying for!
The main problem is that the publishers' goals are not aligned with those of the people who read and write the publications. .
@johncarlosbaez @bstacey it's a bit rich to post this with an "ai"-generated image attached
@johncarlosbaez @bstacey Seems like a mixed message to illlustrate a post deploring the harvesting of people’s original work without their consent for AI with an image created by the exact same thing.
@bstacey @johncarlosbaez @Adzebill Thst thought crossed my mind too. One of the problems with this wild frontier of AI products is having no way of knowing which were built on user-consented content and which were not.
@joncounts @Adzebill - read the image description.
@johncarlosbaez @Adzebill Ah, brilliant. Thanks. That's a clever touch.
@johncarlosbaez @joncounts @Adzebill that does not help.
@johncarlosbaez @bstacey I’m not a fan of AI but there’s some dissonance going on with the articles then using an AI image to advertise them
@PictoPirate - good, you noticed. Did you read the image description?
@johncarlosbaez I didn’t. It now makes sense.
@johncarlosbaez @bstacey As a scientist that aims to do research for the betterment of society, I do not have a problem with any of my works being used to create AI databases. I guess what does rub me the wrong way a little is that publishers are once again making money off of the free labor of scientists. Will they reduce publication costs because of this windfall? Doubtful, they will probably get a great bonus this year.