For-Profit Academic Publishers Love LLM Garbage

by Kevin Munger on November 8, 2024

One of my favorite metasciences lines is: “Does anyone look around and say…things are going great, I just think we need MORE PAPERS?”

Obviously, we all want more scientific progress, better evidence, broader scope — but I don’t think that this is best accomplished by churning out more of these fancy peer-reviewed pdfs. Indeed, our systems of peer review and knowledge evaluation are breaking down under the strain. Everyone is under pressure to produce more and more papers earlier and earlier in their careers.

The situation is accelerating with LLMs. The cost of producing these pdfs continues to decline, and as long as the demand for the pdfs stays strong, we should expect the supply to increase. Everyone agrees that this is a problem.

Well, almost everyone.

If this were a well-functioning economy, demand would eventually be sated. The for-profit corporations running the academic journals at the heart of our enterprise are more than happy to “buy” every single one of these pdfs — because they’ve figured out how to “buy” them from us using our own money.

This “AI revolution” is coming on the tails of another revolution in academic publishing — the move away from subscription-based academic journals to individual articles published “Open Access” in exchange for Article Processing Charges (APCs).

Under the old model, academic libraries paid for printed academic journals to be delivered so that professors and grad students could read them. This was a high-margin business, but there was still a verifiable service being rendered by the publishers: someone had to format, archive and deliver the physical pieces of paper. With the internet, the dead trees became vestigial; the subscriptions were to the online versions of these journals, but they were sold as a package: the libraries had to subscribe to an entire publishers’ catalogue in order for their academic institution to remain competitive.

But this model wasn’t incentive-compatible. Once an academic has published an article, they want it to be read by as many people as possible. And the internet makes it very easy for these pdfs to slip through paywalls. Just like streaming music allowed the big labels to maintain market dominance, the publishing corporations figured out a new model: Open Access. Who could possibly be against Open Access!

Some journals became entirely Open Access, moving 100% to the Article Processing Charge model where the authors pay up front and then the pdf is put on the internet for free. The model comes out of the hard sciences — like all metascience reforms, it makes the most sense in the context in which it was developed. Natural sciences have to get massive grants to actually conduct the research, so the cost of the APC is a rounding error. For social scientists and especially humanities disciplines where the work costs very little or nothing, the APCs represent a massive cost. Even worse, some of the journals are “double dipping” by offering individual authors the right to publish their work open access in exchange for an APC but still charging subscription fees to access the rest of the articles published in those journals — the so-called “hybrid model.”

It’s worth noting that none of this money is necessary. Academics could decide to cut out for-profit journals entirely, to put our own pdfs online ourselves. The Journal of Machine Learning Research did it:

The journal was established as an open-access alternative to the journal Machine Learning. In 2001, forty editorial board members of Machine Learning resigned, saying that in the era of the Internet, it was detrimental for researchers to continue publishing their papers in expensive journals with pay-access archives. The open access model employed by the Journal of Machine Learning Research allows authors to publish articles for free and retain copyright, while archives are freely available online.

The journal I co-founded does it:

JQD:DM is a diamond access, double-blind peer-reviewed scholarly journal hosted on the University of Zurich’s HOPE platform. We do not require article processing charges (APCs), and articles are available to access for free (CC BY-NC-ND 4.0).

The journal publishes quantitative descriptive social science. It does not publish research that makes causal claims.

The Journal of Trial and Error does it — and they even published a handy guide for setting up your own diamond open access journal online.

But the absurdity of for-profit academic publishing has been obvious for decades. Academics are complacent — and there enough academics in pivotal positions getting a few crumbs from these academic publishers — so the system perpetuates itself. Fine. These are messy, complex systems and it’s naive to think that changing them will be easy.

Which actors want more pdfs in circulation? The ones getting paid $1,500 or $3,450 or $12,290 a pop.

But the APC model means that academic journals are aiming for a future in which we’re swimming in LLM garbage. Which actors want more pdfs in circulation? The ones getting paid $1,500 or $3,450 or $12,290 a pop. What an astonishing business model, as the FT reportsNature is the journal commanding that eye-watering sum.

Column chart of Revenues and operating profits ( €bn) showing Springer Nature is increasingly profitable

The for-profit journals don’t care about academic progress. They care about….profit. They literally cannot care about anything else, thanks to the doctrine of maximizing shareholder value in a context of private equity and hostile corporate takeovers. If they try to care about anything but profit, someone can come along and take them over and make them more profit-focused, as Dan Davies documents in his excellent book earlier this year.

Peeters and Dambeck say: hooray for Open Access!!

A recent publication titled “The oligopoly’s shift to open access” provides some hard numbers:

We aim to estimate the total amount of article processing charges (APCs) paid to publish open access (OA) in journals controlled by the five large commercial publishers (Elsevier, Sage, Springer Nature, Taylor & Francis, and Wiley) between 2015 and 2018…we estimate that globally authors paid $1.06 billion in publication fees to these publishers from 2015–2018. Revenue from gold OA amounted to $612.5 million, and $448.3 million was obtained for publishing OA in hybrid journals. Among the five publishers, Springer Nature made the most revenue from OA ($589.7 million), followed by Elsevier ($221.4 million). (emphasis mine)

The annual revenue from Open Access has surely skyrocketed in the six years since the data from this study concluded.

Academia is far smaller and more federal than “the market.” People literally built this thing. In order to make it better, we should do metascience: we should use our tools to figure out what we want to do, and then use our institutions to actually do that. The mystical complexity of “the market” as some emergent phenomenon just doesn’t apply here.

Academic publishing is yet another example of the artificial. We’ve let the epiphenomenal outputs of doing science, of generating knowledge, of thinking, stand in for the phenomenon itself. The idea that using “AI” as a shortcut for human work will produce a de-skilling of new generations is now well-understood. But this is just one example of the more general cybernetic maxim that “You can never do one thing.”

When we write and publish a peer-reviewed paper, there is work going in at every stage. The fact that there’s a human coordinating all of the action means that there’s a human with a map of the overall project in their head. When we divide and sub-divide all the processes that go into the project — whether through human delegation or with “AI” — we lose the guarantee of an integral, individual human brain with the overall map of the project in it. The future of AI-powered social science is one where every scientist is a middle manager.

Academia isn’t just producing pdfs. The more we throw ourselves into this artificial little world where Springer-Nature gets $12290 a pdf, the less effective academics become at all of the other functions we serve in society.

The future of AI-powered social science is one where every scientist is a middle manager.

Now, the looming disruption promised by LLMs can be productive if we collectively try to address the crisis. The status quo really is incoherent and sclerotic. The most likely response to the flood of pdfs is a further retreat into elite networks — being “in the room” will become even more important if that’s something AI can’t do. This is far from the optimal outcome. We should think bigger, about how LLMs (or even just the internet) allows us to re-organize scientific knowledge production.

One thing we definitely need to do, though, is work. We need to read and write and think about things. The “Open Science” response is to create artificial representations of those activities in order to make them auditable, to make them visible to outsiders, to make academics therefore rankable. This means throwing away the very thing that makes academia distinct. Academic freedom doesn’t just mean getting to say whatever we want — it’s not just this freedom to. It’s the freedom from the systems of control that continue to creep into every area of human endeavor.

It was one thing when the for-profit academic journals extracted ridiculous profit margins from the work done by volunteers and financed by taxpayer dollars. As communication technology changes and they scrambled to adapt their business model, the actual practice of science has shifted as well. I know that serious scientists don’t wanna hear it, but the scientific knowledge we produce is obviously and strongly structured by our institutions. The tighter the labor market and the more artificial the metrics we use to evaluate each other (the farther from actually reading the work and subjectively evaluating its quality), the more power these institutions have.

And now these for-profit corporations are setting the agenda for how LLMs will be incorporated into scientific practice. They are very clearly aiming towards a world in which text is commodified, greater and greater volumes of meaningless and unread text circulating for the sole purpose of individual academic careers — where they get $1,500 a pop. Hence the only guidelines for the use of LLMs in academic publishing put out by Springer amount to: “Let’er rip.”

The big worry in the research community is that students and scientists could deceitfully pass off LLM-written text as their own, or use LLMs in a simplistic fashion (such as to conduct an incomplete literature review) and produce work that is unreliable….

That’s why it is high time researchers and publishers laid down ground rules about using LLMs ethically. Nature, along with all Springer Nature journals, has formulated the following two principles:

The fear is that LLMs might threaten “Transparent Science,” per the title, or that LLMs might produce work that is “unreliable.” So they say that in order to use LLMs “ethically,” they require that you….just say that you used LLMs. That’s it. Otherwise go nuts. Don’t worry about the institutions of knowledge production and verification — let’s just get as many pdfs circulating as we possibly can.

Anyone talking about “The Ethics of LLMs” and scientific publishing is trying to sell you something—or, in the case of Springer Nature, trying to buy something from you with your own money.

If the ethical questions are individual rather than systemic or collective, the questions are irrelevant.

{ 0 comments… add one now }

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>