What CHATGPT Reveals about the Collapse of Political/Corporate Support for Humanities/Higher Education

by Eric Schliesser on January 5, 2023

Like many other academics, it seems, I spent part of Winter break playing around with ChatGPT, a neural network “which interacts in a conversational way.” It has been trained up on a vast database, to recognize and (thereby) predict patterns, and its output is conversational in character. You can try it by signing up. Somewhat amusingly you must prove you the user are not a robot. Also, it’s worth alerting you that the ChatGPT remembers/stores your past interactions with it.

It’s uncanny how fluent its dialogic output is. It will also admit ignorance. For example, when I asked it who was “President in 2022,” it responded (inter alia) with “My training data only goes up until 2021, so I am not able to provide information about events that have not yet occurred.”

Notice that it goes off the rails in its answer because it wrote me that in 2023! (It’s such a basic mistake that I think claims about it passing, or faking, the Turing test are a bit overblown, although one can see it being in striking distance now.) When I pressed it on this point, it gave me a much better answer:

I apologize for any confusion my previous message may have caused. As an AI language model, I do not have the ability to perceive or experience time in the same way that humans do. I can only provide information that I have been trained on, which only goes up until 2021. I do not have the ability to access new information or to browse the internet, so I am not able to provide information about events that have occurred after my training cutoff.

Fair enough. But as I started to ask it more challenging academic and intellectual questions, including composing syllabi or writing student essays, I was both impressed by some of the output (it produced a lovely short essay on why Ibn Tufayl presents two creation stories in Hayy ibn yaqzan) and taken aback how often it simply makes up stuff out of whole cloth (including completely fake publications by me).

Without wanting to diss the underlying neural network, but ChatGPT is a bullshitter in the (Frankfurt) sense of having no concern for the truth at all. I am (alas) not the first to notice this (as a Google search) quickly revealed (see here). And as Arvind Narayanan and  Sayash Kapoor anticipate in their essay, this helps explain why I am much more impressed by ChatGPT’s poetry and jokes (which so far are, however, nearly always plagiarized). That I am impressed by its poetry may say more about my limitations than the ChatGPT’s capacities, but I do think that its inventive capacity, in virtue of being able to predict from existing patterns, is worth reflecting on by someone with more expertise in AI and poetry than I.

Of course, no philosopher of science is going to be wholly surprised that a brilliant predictive machine need not be truth tracking. (False models can predict quite nicely, thank you very much Ptolemy.) But if ChatGPT didn’t guess so much where it could simply express ignorance, I suspect one would quickly find it trustworthy. Of course, it’s possible that in conversation we humans are not likely to often admit ignorance, so that in its fondness for bullshitting (in low stakes environment) ChatGPT is all-too-human, after all.

Of course, on social media, many academics reflected rather quickly on what would happen if, as is inevitable, students use ChatGPT to write their essays for them. I have seen some remarkably astute and creative proposals to have students discover (alone or in groups) how ChatGPT is a kind of bullshit generator (and so learn the difference between authoritative sources and others), or how to use it to produce bullshit free content. So, I don’t want to suggest what follows is the only response I have noticed.

However, I have seen many of my American colleagues remark that while ChatGPT is, as of yet, unable to handle technical stuff skillfully, it can produce B- undergraduate papers. In a very sophisticated and prescient essay last Summer, the philosopher John Symons had already noticed this: “It turns out that the system was also pretty good, certainly as good as a mediocre undergraduate student, at generating passable paragraphs that could be strung together to produce the kinds of essays that might ordinarily get a C+ or a B-.” Symons teaches at Kansas, but I have seen similar claims by professors who teach at the most selective universities. (I teach in the Netherlands, where B- is, by contrast, generally a very high pass.)

But this means that many students pass through our courses and pass them in virtue of generating passable paragraphs that do not reveal any understanding. I leave aside whether this situation is the effect of well-intentioned grade inflation, or the cynical outcome of the consumer cannot be wrong model, or the path of least resistance for overworked and underpaid adjuncts. (And if you are against grading altogether, I salute you.) Of course, I am not the first to note that in many ways higher education is a certification machine, where the signal generated by the admissions office is of more value to future employers than the subsequent scholastic record. But it is not a good thing that one can pass our college classes while bullshitting thanks to (say) one’s expensive private, high school education that taught one how to write passable paragraphs.

This state of affairs helps explain partially, I think, the contempt by which so many in the political and corporate class (especially in Silicon Valley) hold the academy, and the Humanities in particular (recall also this post a few months ago). (I am not the first to suggest this; see here; here on the UK; here on Silicon Valley and US politics.) And, as I reflected on the academics’ response to ChatGPT, who can blame them? The corporate and political climbers are on to the fact that producing grammatically correct bullshit is apparently often sufficient to pass too many of our introductory courses. (I started thinking about this in a different context: when a smart student, who clearly adored my lectures, fessed up that they could pass my weekly quizzes without doing the reading.) And if introductory courses are their only exposure, I suspect they infer, falsely, from this that there is no genuine expertise or skilled judgment to be acquired in the higher reaches of our disciplines. To be sure, they are encouraged in this latter inference by the countless think pieces stretching back decades by purported insiders that strongly imply that the humanities have been taken over by bullshit artists. (If you are of my generation you are likely to treat the Sokal Affair (1996) or the letter protesting the intention to award a honorary degree to Derrida by Cambridge University (1992) as ground zero, but obviously one can go further back.)

As an aside, ChatGPT denies the claim in the previous paragraph: “There has been a trend in recent years towards an increased focus on science, technology, engineering, and math (STEM) fields, which has led to some people arguing that the humanities are less valuable or important than other fields of study. However, the humanities continue to be an important part of higher education and have a significant impact on society.” When I pressed it on this (even mentioning the Sokal and Derrida affairs), it generated a “network error” (I am not making this up). When I restarted the chat, it answered (after, as it is wont to do, giving anodyne summaries of both affairs), “It is difficult to say whether these events show that the humanities have been “infiltrated by bullshitters,” as this would depend on one’s definition of “bullshitters” and one’s perspective on the events in question. However, it is clear that these events sparked significant debate and discussion about the nature of knowledge, truth, and intellectual standards within the academic community.”

That there is such contempt is clear from the fact that in countless recent political controversies, the Humanities are without influential friends. Now, I don’t mean to suggest masochistically that this state of affairs is solely the product of what we do in our introductory classrooms. For all I know it is the least significant contributing factor of a much more general cultural shift. But it is to be hoped that if ChatGPT triggers us into tackling how we can remove our willingness to give bullshit a pass — even for the wrong reason (combatting the the threat of massive plagiarism) — then it may help us improve higher education.

{ 66 comments }

1

CityCalmDown 01.05.23 at 9:01 am

Hopefully I can return later to offer a more detailed commentary, time willing. For the moment, it may be useful to address the questions raised by Eric Schleisser by enacting the type of paradigm shift proposed by Luhmannian social-systems theorist, Elena Esposito in her “Artificial Communication: How Algorithms Produce Social Intelligence”.

(N.B. the link here is to the MIT Open Access website where the entire book has been made available for free).

“Algorithms that work with deep learning and big data are getting so much better at doing so many things that it makes us uncomfortable. How can a device know what our favorite songs are, or what we should write in an email? Have machines become too smart? In Artificial Communication, argues that drawing this sort of analogy between algorithms and human intelligence is misleading. If machines contribute to social intelligence, it will not be because they have learned how to think like us but because we have learned how to communicate with them. Esposito proposes that we think of “smart” machines not in terms of artificial intelligence but in terms of artificial communication.

To do this, we need a concept of communication that can take into account the possibility that a communication partner may be not a human being but an algorithm—which is not random and is completely controlled, although not by the processes of the human mind. Esposito investigates this by examining the use of algorithms in different areas of social life. She explores the proliferation of lists (and lists of lists) online, explaining that the web works on the basis of lists to produce further lists; the use of visualization; digital profiling and algorithmic individualization, which personalize a mass medium with playlists and recommendations; and the implications of the “right to be forgotten.” Finally, she considers how photographs today seem to be used to escape the present rather than to preserve a memory.”

https://direct.mit.edu/books/book/5338/Artificial-CommunicationHow-Algorithms-Produce

2

John Q 01.05.23 at 10:22 am

For those of us who reject grading, but can’t avoid it, this is a real boon.Anyone who wants a B- can just pick some prompts and turn in the output from GPT. It will save trouble for all concerned, and should greatly reduce the tragic toll of grandparents dying just when essays are due.

3

Eric Schliesser 01.05.23 at 10:25 am

John Q, in some jurisdictions the savings you project are far outweighed by the time spent and tedium at the exam committee sorting out cases of plagiarism.

4

Alex SL 01.05.23 at 11:01 am

As mentioned in other threads, I am a scientist, but I do not to reject the humanities (or, as we would have said back in Germany, the social sciences) as empty nonsense. I believe they generate knowledge, and knowledge worth having.

However, there is clearly a bit of an issue in the way they are taught. I started getting that impression already in what would here be called high school, when it seemed that language teachers forced us to over-analyse novels and plays in a way that seemed rather implausible. But the real eye-opener was the big plagiarism scandal around German politicians in 2011, when it occurred to me that such a scandal would simply not be possible in the natural sciences full stop.

These were by training all historians, economists, political scientists, etc, whose dissertation process consisted entirely of reading thirty books and then writing the thirty-first on the exact same topic. They took the short-cut of copying and pasting some text from their sources and then cosmetically changing a few words, and that was plagiarism. But if they had rearranged sentences more thoroughly, they would have fairly obtained their degrees, and there would not have been any scandal; and, crucially, the amount of new knowledge generated would have been exactly the same, i.e., zilch, nada, zero.

In science, however, a graduate student would have been expected to generate new data. The problem to watch out for is not plagiarism, but manipulation of data to make them more “interesting”.

To me, that points to the solution. What chatGPT cannot do, what no mind will ever be able to do without going out into the lab or into the field and run its own surveys, digs, and experiments, is generate new insights that aren’t in its training set. Surely that is a thing that is still possible to achieve in economics, archaeology, social sciences, anthropology, linguistics, etc.? And if a field cannot have that hope, then one would really have to have a conversation about whether it is something worth teaching. (Whispering: “theology”.)

On a different note, I do not know if there is contempt for the humanities; I would say it depends on what aspects of a field one has in mind and who is talking in what context. Yes, politicians constantly go on about how we should increase investment in STEM. However, there is, ironically, an area close to my heart where the humanities seem to have it easier: collections.

As a biologist, I wince at news about natural history collections being underfunded, closed down, merged. A news item from a few years ago comes to mind where a US university was shutting down its biological specimen collection to make room for another sports facility. Where biological specimens are being photographed to make the images available online, managers may be heard joking (or perhaps floating a thought bubble?) that now all the physical specimens (be they vouchers for a recent project that need to be kept to enable reproducibility in science or the historical collections of a famous pioneer in the field from 1817) can be thrown out to make space for something else.

The thing is, nobody in their wildest dreams would consider treating the national art gallery or the war memorial in this way or suggest pulping an original copy of the US declaration of independence because it has now been digitally photographed. This is not about teaching, of course, but people seem to have an easier time grasping the importance of physical collections and specimens for the humanities than for the sciences.

5

Eric Schliesser 01.05.23 at 11:27 am

Hi Alex SL,
Yes, you are surely right that some collections are revered and even part of civic religion/culture. Before I agree with you on the German politicians’ plagiarism scandals, these reveal that at some point a Humanities education was also highly revered in Germany and so became attractive to fake mastery in it. That’s worth noting. (While Germany was not unique in this after WWII, it is highly distinctive of its political culture.) I agree with you that scandals reveal something rotten about graduate education in lots of departments.

6

Trader Joe 01.05.23 at 12:48 pm

I recently spent some time feeding CHATGPT some business and economic related questions and found as suggested that it does a decent job of spewing out a lot of basic information such as what might be quickly found on Wikipedia but provides rather little insight or analysis that anyone would reward academically.

That said, rather than be sorta miffed by what it couldn’t do I see a possibility in what it can do. In lots of professional writing there is a section of a report which provides background or baseline information simply to help frame the analysis that is the main value of the report.

Using CHAT I felt that, with the right questions, someone could quickly generate some of these more ‘boilerplate’ parts of the report and with a little editing/embellishment produce something that was adequate to the task of providing background without effectively reinventing the wheel. To be sure, many users skim over these parts of reports anyway so the human task is mostly making sure there are no embarrassing errors more so than brilliant commentary (that would be supplied elsewhere).

Net/Net: Its a tool and it might have its uses for those willing to seek them. I doubt it would fool anyone who was even slightly attuned to looking for it.

7

Kevin Lawrence 01.05.23 at 1:54 pm

Addressing various points made above:

I find it interesting that so many critics compare ChatGPT’s knowledge with that of experts: “I am one of the world’s foremost experts on software engineering and ChatGPT doesn’t know as much as me.”

I think a fairer comparison is with the average software engineer. There are millions of them. What happens to their jobs? See also: corporate lawyers, people who write copy for advertising and corporate websites, scriptwriters and journalists for local newspapers. How many sitcom scripts are original versus lame jokes that have been cobbled together from previous sitcoms?

Bullshitting with very little knowledge is almost a job requirement for politicians. See also: a recent tousle-haired prime minister.

So, ChatGPT is able to earn a B on an undergraduate paper by rearranging someone else’s words. We can worry about plagiarism but why would even want to teach an average student if an AI can do it better? My experience is that many teachers, tutors and lecturers merely regurgitate knowledge that they have read elsewhere. The best teachers don’t do that — but what about the rest? Can’t ChatGPT do that just as well?

ChatGPT is a very early version of a large language model (LLM). I find it fascinating that an LLM can perform so well (already! — wait until the next version!) when it does not claim to have any intelligence at all. I think it calls into question what we mean by intelligence. What happens when the next AI combines LLM with something we might call intelligence? What if it is coined with limbs and eyes and ears? Could it do lab work and field research too?

ChatGPT might fail the Turing test if the opposing human is a professor in the humanities. How would it perform against the man on the Clapham omnibus? I think it would pass.

AI is coming for your job

8

TM 01.05.23 at 3:20 pm

“it can produce B- undergraduate papers”

I have been involved in US higher education for a few years and I made the experience that some professors gave students reasonably difficult sounding assignments and then didn’t have the time to actually meaningfully grade them so they just gave everybody an A or B. I don’t know how representative this experience is but anecdotal evidence suggests that this kind of thing happens. Which means that some students get As or Bs despite having written complete bullshit and never get the feedback “this is really bullshit”, or at least “these claims are wrong or dubious”, or “the argument doesn’t support the conlcusion” or some such. Which I think is a problem.

Could this have some bearing on why this neural network’s bullshit output is recognized as B-worthy undergraduate papers?

9

Harry 01.05.23 at 3:31 pm

I asked it to write a Morecambe and Wise sketch, a song by Richard Thompson, and whether it was permissible to detach oneself from the violinist. It knew what I was referring to in all cases and was spectacularly bad at all tasks, especially the last. I’ve wondered about giving undergraduate students the output in the final task as an example of how not to reason about ethics.

“there is clearly a bit of an issue in the way they are taught”. I thought understatement was the preserve of the English, but apparently not!

10

TM 01.05.23 at 3:33 pm

Alex re collections: This depends to some degree on whether the collections in question are open to the public, and their scientific value explained to the public. An example of a natural history museum that tries that:
https://www.nmbe.ch/de/ausstellungen-und-veranstaltungen/wunderkammer-die-schausammlung
I have seen this approach more often recently, for example an archeological museum that put on an exhibition of all its thousands of artifacts in their opened storage boxes just to show the public what they usually don’t get to see but what is crucial for rersearch. Whether that helps generate funding is a different question though.

11

Jonathan 01.05.23 at 5:58 pm

I think with some thought it would be very easy to design assignments which ChatGPT would do poorly at. The problem is that it is also going to be straightforward to develop more complex systems which by building the appropriate prompts in a programmatic fashion for a specific task are going to be able to assemble a solid paper for any standard type of writing assignment. For example, you can prime ChatGPT with a text and then it can build its’ answer using the information in that text.

Imagine a software tool that ingests the relevant chapters of a textbook or other collection of readings, maybe automatically goes and grabs the documents they reference, and then follows a structured template for writing a paper using an LLM like ChatGPT to generate text at each stage. It would generate a thesis statement and then the supporting arguments with evidence for those arguments, referencing the ingested documents, and then combine that into the final paper format. You could build in checks along the way to make sure it doesn’t contradict itself and that its’ citations are real. Maybe for that human touch, it would be interactive with the student making decisions like which side of an argument to support. It won’t generate any real original insight, but that’s not a realistic requirement of a school paper.

In a broader context, this technology means that superficially coherent and convincing writing on any subject and for any side of an argument will be able to be produced on a mass scale. A Gish Gallop at a speed that’s hard to comprehend.

12

Joe 01.05.23 at 6:16 pm

Simon Kuper’s recent book “Chums” made for interesting reading as ChatGPT became more prominent over the last month or so. It’s about the role Oxford has had in producing elite bullshit artists like Boris Johnson who go into politics and other positions of influence. Kuper is careful to distinguish this set from those who are serious students, but I couldn’t help wondering how much of this particular aristocratic appreciation for dominating others via eloquence has trickled through humanities education in general, and not just in the UK.

13

Aardvark Cheeselog 01.05.23 at 6:34 pm

But this means that many students pass through our courses and pass them in virtue of generating passable paragraphs that do not reveal any understanding.

OP writes as though there was once a halcyon time when this was not so. And maybe there was, in the days when the student/teacher ratios were such that the teacher could actually read all of the papers, point out where there was bullshit, and press the student on it until the student took out the bullshit and replaced it with something that constituted evidence of insight. If there ever were such days. Certainly at no time in the last 40 years has it been like that at most Universities in the US.

It is the nature of the way of knowing that this is so. A freshman calculus student can be made to demonstrate the ability to take first derivatives of algebraic expressions, or fail the course. A freshman chemistry student can be made to show competence at using equillibria to solve quantitative analysis problems, or fail. Things are not quite so clear cut when it comes to life sciences but the student can still be judged on whether or not they’ve mastered some body of findings and can reason about them.

Whatever it is that you’re teaching in your lower-division humanities courses is just a different kind of thing than all of that. You have to have labor-intensive evaluation of their work product by an expert to judge whether the students are getting anything, and labor-intensive one-on-one interaction to point the student in the right direction sometimes. And really you should probably be making your judgements about whether they’re getting it at more like yearly intervals, with big blocks of time scheduled for evaluation and conferences with the student on what the evaluations are showing in the way of deficiencies.

And yes there is instinctive contempt from a certain fraction of the STEM people toward the humanities for exactly this reason, because you can get to be pretty damned good at STEM subjects by pretending that STEM is the only way of knowing that deals with real knowledge. There’s no solution to this except making the STEM people get educated enough about minds and reality that they see their mistake. I have no constructive suggestions about that.

14

Sashas 01.05.23 at 6:44 pm

ChatGPT Thoughts

As far as I can tell, “creative” AI projects like ChatGPT are essentially very well built randomizers around a crowdsourcing core. Crowdsourcing to find “correct” answers has a long history, and features some obvious glaring weaknesses, and I believe all of these weaknesses apply to creative AI projects as they exist now as well.

ChatGPT and similar projects are great at interpolation, but they have no mechanism for generating new ideas that don’t lie “between” existing ideas. (I’ve argued this with friends recently when discussing art, and they claimed that this is how human art students learn as well. I’m… not convinced.)

ChatGPT and similar projects are very vulnerable to mass bias. If conventional wisdom is wrong, then the AI will get it wrong too. As with crowdsourcing, this means that the AI is right most of the time, but in my opinion it will be wrong when it counts most. And there’s really no way that I know of to predict when it will run into one of these blind spots.

ChatGPT and similar projects have no conception of the scale of errors, and will make dramatic errors just as easily as they make minor ones. The most obvious example I have of this kind of AI behavior comes from the combat AI in the Civilization game series, which admittedly is not the same kind of AI at all. But I’m claiming this is an unaddressed weakness of all AI projects I’m aware of, so hopefully it illustrates the point. The AI in that game has to move units around, and in many cases optimal moves are not obvious. Human players, especially novices, make mistakes all the time. But there are certain moves that are so obviously mistakes that not even novice humans will make them. (Example: When attacking an enemy city, moving your siege units one by one into a lake next to the city instead of attacking the city, whereupon the units die because they can’t defend themselves when floating in the lake.) Brand new humans in their first game won’t make these mistakes… but the AI will.

Building on the previous point, one thing that I’ve noticed in teaching logical proofs to my students is that student errors come from misconceptions. If a student has a certain misconception, they will make several errors that all logically follow from that misconception. This is, I think, the reason behind the previous point. ChatGPT and similar projects don’t make misconceptions and then err as a result. They just make the errors directly. This is probably something that AI could be taught to resolve, but I don’t think the current versions are doing it. Essentially, I agree with the OP that the AI is bullshitting a la Frankfurt.

In Defense of the Humanities

Contra the OP, I don’t think the contempt that many politicians and corporate leaders hold for the academy has anything to do with qualities of the academy. I think this is fully explained by a combination of ignorance (“I don’t understand it, therefore it must be crap”) and envy (“I don’t own it, therefore it must be crap”). In my state of Wisconsin, one of our local political parties has been very open about their desire to destroy public higher education. In the case of our former governor, it literally came down to the fact that he had failed out of college himself, and in revenge wanted to wreck the system. If they claim to hold humanities in contempt for [reason], I counter that we already know they view the humanities as their enemy, and they’re claiming [reason] because it makes them look less shitty than they really are.

We shouldn’t actually be surprised that bullshit can sneak through our introductory courses, or cheating for that matter. The reason that these things can sneak through is that educators don’t have the resources to evaluate everything very closely. In most algorithms problem sets (not even an introductory level course! nor in the humanities!), an instructor would assign ~3 problems for students to solve, and randomly grade one of them. A student who bullshits on the other two slips through uncaught. For introductory courses, there’s so much to cover most of the time that students will have to produce things that we simply aren’t evaluating at that time, and this can look like bullshit slipping through the cracks when really there’s no issue at all. (Or at least the issue is a lot more subtle than detractors might claim.)

I’ll also point out that an educator’s job is not “detect bullshit”. It is “enable students to learn”. There’s a lot more to say on this topic, but if 28 of my students learn the material and pass and 2 students bullshit their way to a pass, I think there’s a case to be made that this is a victory we should celebrate. The ability of bullshit to succeed isn’t necessarily worse than students failing.

@John Q (2)

I’ve had a few proofs submitted to me that I suspect were ChatGPT fabrications. Evaluating them required putting my focus on different parts of the submission than I’m used to and I really did not enjoy the process. In other words, I don’t think ChatGPT-generated B- essays look like student-generated B- essays, and I’m skeptical of any benefit of embracing this tech, even as I’ve argued above that I don’t think its occasional interference is that big a deal. I guess I’m saying that if I get 2 ChatGPT “proofs” in a semester I’m annoyed but not worried. If I get 100 in a semester I might switch to oral exams.

@Kevin Lawrence (7)

I think the comparison to the average software engineer (for example) is the right comparison. I’ve argued above that I think ChatGPT and the like still fail due to the way they make errors differently from novices. But I do fear that AI is coming for people’s jobs in that managers will either not see or not care about this different error creation style. And we’ll all be the worse off for it.

15

engels 01.05.23 at 8:03 pm

JQ #2

ZIZEK: that AI will be the death of learning & so on; to this, I say NO! My student brings me their essay, which has been written by AI, & I plug it into my grading AI, & we are free! While the ‘learning’ happens, our superego satisfied, we are free now to learn whatever we want

16

Alex SL 01.05.23 at 9:49 pm

Eric Schliesser,

Thanks. Expanding on your thoughts, why do politicians need doctorates? They don’t. Why do they want them? It is interesting to observe that IRC at least at the time, the parliamentarians with the highest percentage of doctorates were those of the Greens, followed by the conservatives, whereas the social democratic and socialist parliamentarians had the lowest. The Greens have the most highly educated voters/supporters, so that makes sense, but something else would have to be at play for the conservatives, who very much don’t. My hunch is that they are more likely to find prestigious titles impressive than other ideologies are, although, as this scandal showed, sometimes only the titles as such, not of the substance behind them.

But that is just my hunch.

Kevin Lawrence,

If AI comes for our jobs just as industrialisation came for others in the past, one of three things will happen. Either new jobs will be created, as they have in the past; I at least am not working on a farm, although at some point all my ancestors were doing that. Or we will finally realise, as we should have more than a hundred years ago, that we don’t all need to work 40 hours a week now that machines can do so much for us. How about we all work 20 hours a week? Or those who want to work their 40 hours can do so, while three quarters of the population get a base salary for pursuing their hobbies and taking care of each other?

Or most people will be permanently unemployed and live in poverty, with only a small minority in work and earning enough to have a comfortable life. But to spell this third option out is to immediately refute it. Not only would that not be a stable society, it also would mean the collapse of an economic system that depends on people being consumers. There would consequently be enormous political pressure to get onto solutions one or two (sadly, number one much more likely).

But I will believe that when I see it. There is probably a reason why even now, decades after robots were introduced in production lines, we still don’t have robots doing any more complicated domestic tasks for us than vacuuming or moving the lawn, and even that requires careful preparation and management so that they don’t get lost or stuck. (Also thinking of the Youtube videos of delivery robots being run over by trains or delivery drones crashing because there was a light gust of wind.) The comfort zone of AI is doing clearly defined things very efficiently, not dealing with complexity and uncertainty. What is more, it is just not economically viable to create an AI and/or robot for every niche problem; you need a sufficiently large market to make it worth the while.

TM,

It would be interesting to have empirical data on that, but my hunch is that unless the collection is something that already excites the public – Mona Lisa or crown jewels level of attraction – the main result would be that the resources being put into public displays and communications are now lost to research and curation. Not saying that isn’t worth doing, but the curators do have limited time, and if they have to spend 40% of their time doing displays, they only have 60% of their time left for curation.

17

Patrick S. O'Donnell 01.05.23 at 10:32 pm

I value this discussion if only because it has the potential, or at least I hope it has the possibility, to move at least some of the more philosophically inclined individuals to re-visit the differences between AI and the intelligence of human animals. Indeed, I think many of the concerns, anxieties, problems, and so forth can be at least partially allayed should we reach a bit more clarity on the larger and deeper questions of human nature, personal identity or personhood, notions of self and character, the powers of perception, judgment and understanding, the nature of human emotions, the nature of human agency, and moral psychology, as well as notions of consciousness in light of that which is related to same, namely sub-conscious and unconscious mental processes or mechanisms (the last best explored with the tools of psychoanalytic theory and praxis). (I happen to have bibliographies on the aforementioned subjects on my Academia page should anyone be interested.) Clarifying the relevant distinctions and differences (without denying some overlap or analogies or metaphorical resemblances) is absolutely necessary, and there are philosophers and philosophically tempered or dispositional psychologists who have begun such work and others should join them in this–to use a well-worn cliche–urgent endeavor.

18

Jim Harrison 01.05.23 at 11:22 pm

How different is ChatGPT from the way most human beings generate speech most of the time? As Heidegger might have put it—if he had ever had a sense of humor—proximally and for the most part, what we have here is a bunch of radios that think they can play the guitar. And let’s be fair to students. It can be damned hard to hit the sweet spot between plagiarism and inappropriate creativity.

19

RobinM 01.05.23 at 11:32 pm

Couldn’t one exercise some control over AI generated essays by requiring students to submit all their drafts and notes along with the final product?

Otherwise, I’m glad I retired quite some time ago. Good luck to those still in the trenches.

20

Chetan Murthy 01.06.23 at 5:40 am

Finally, somebody finds something truly interesting to say about ChatGPT. I will have to think about this at length, b/c it’s really thought-provoking. Thank you so much for this!

21

David in Tokyo 01.06.23 at 8:05 am

“How different is ChatGPT from the way most human beings generate speech most of the time? ”

Exremely different. Most people generate speech most of the time to communicate with other humans. This requires both an understanding of the real world, and an understanding of how to talk about the real world in language. ChatGPT has no understanding of the real world whatsoever. It doesn’t even understand numbers. (Gary Marcus persuaded it to argue, at length, that 5 was larger than 7.) It’s a parlor trick. A very large, complex, expensive parlor trick, but a parlor trick.

Also, it appears to be a parlor trick that has a lot of people spending a lot of time hacking in special cases so it only says things that are “safe”.

22

Neville Morley 01.06.23 at 8:08 am

As another of those academics who spent some time playing with ChatGPT and then blogging about it, I’m very much in the camp that thinks its output is most interesting as a mirror for our current practices, a prompt to examine and critique them, rather than being terribly illuminating in itself.

I’m especially struck by some transatlantic differences, aka ‘How the hell is anyone giving a B to this stuff?’ If I ask it to write an essay in Roman History, including specific prompts to discuss evidence and include modern scholarship and bibliography, it produces something that, by our marking criteria, offers a mixture of bare pass and marginal fail attributes. My feeling is that it’s the sort of essay that, if an actual student submitted it, I would give a bare pass largely on the basis that it more or less made sense and the confident style suggested a degree of knowledge; I would certainly wonder about failing it, from suspicion that it really is just bullshitting, but would probably give it the benefit of the doubt. I would now, of course, regard such an essay with great suspicion simply because of the mismatch between its lack of content and understanding and its confident style and accurate English.

If there’s a world in which the same essay is genuinely a B or C – which I take to mean, at the least, a clear rather than marginal pass with elements of genuine quality or promise – then it must be a world with dramatically different expectations and/or without clear marking criteria and/or where assessment just isn’t taken that seriously as a learning exercise. I’m also struck by the suggestion that academics are so flooded with papers to be marked that they can’t possibly spare time to identify faulty reasoning or bullshit but just have to churn out grades – I’m sure this is an exaggeration, but that’s the impression that comes across.

I don’t have any direct experience of the US system – are you regularly expecting students to produce weekly or fortnightly submissions? At least in a discipline like history – it’s obvious that STEM disciplines may be different – that means you are pretty well guaranteed superficial work that gets assessed superficially – box-ticking on both sides of the student/lecturer divide. No wonder I hear of colleagues moving towards ‘no grade’ assessment of their courses.

23

MisterMr 01.06.23 at 9:16 am

I am a bit pissed off by the fact that we call this sort of alorithm learning “AI”, whereas there is very small intelligence.
That said, I suspect that AI will also become good at reproducing STEM stuff unless there are experimental data to be generated.

I do draw webcomics for a hobby and currently on the forum I follow there is full on AI panic for artistic AIs.
If you think on how an art AI work VS how a human artist works, suppose I ask to the human artist to draw something scary/horror: the human illustrator knows what “scary” means, understands the feeling he wants to generate, and will work on the basis of this, often but not necessariously also using images that she finds scary that she saw elsewhere.
The art AI doesn’t know what “scary” means (it is a feeling that the AI has not) but it references a ton of images that someone else associated with the word “scary”, so it will reproduce something relative to these images and get something that is in fact scary, but without actually “understanding” it.
If we distinguish between “intelligence” and “knowledge” the AI has exactly 0 intelligence but an enormous amount of knowledge, and therefore since by “learning” we largely mean the acquisition of knowledge the AI is already beating the humans.
It is a bit like lamenting that if I ask a historian something he will be hard pressed, on most arguments, to beat Professor Wikipedia, but this doesn’t make Wikipedia an AI.
Crucially this works also with STEM stuff unless you search for something very specific or for newly generated experimental data.
The problem is simply that we are not used to disringuish “intelligence” from “knowledge”.

24

JPL 01.06.23 at 9:58 am

“… ChatGPT is a bullshitter in the Frankfurt sense of having no concern for the truth at all.”

One important aspect of this phenomenon may be that ChatGPT has no independent interaction with the world and no “Transcendental Unity of Apperception”, or unity of intellectual purpose, and so can not treat the world as an open-ended intentional object that poses a puzzle for (its) understanding. Thus it lacks seriousness of intent in a social context of truth-seeking (argument). Good student writing tasks emphasize identifying the student’s sincere intellectual passions (in harmony with ethical principles), which come through in the writing.

25

1soru1 01.06.23 at 10:45 am

. A freshman calculus student can be made to demonstrate the ability to take first derivatives of algebraic expressions, or fail the course.

The thing about the new AI essay generators is that you can use them to give any arbitrary humanities course the same property. Assign a mixture of ai-generated bullshit and real essays to each student, and have them tell you which is which. All it takes is an hour under controlled conditions, i.e. logged internet access.

No requirement to write, or read, long essays. at least for evaluation purposes. Obviously you can still do that if you find it a a useful pedagogical exercise, and your students agree. But evaluation becomes about as simple and objective as it is in the athletics program. Can you run fast, or jump high? If not, what are you doing here?

In one possible future, any field that fails to apply such a minimal filter to the students it certifies as excellent will be regarded as a joke. There are other possible futures; they do seem worse.

26

passer-by 01.06.23 at 2:45 pm

Caveat: I haven’t tried ChatGPT myself, just read about it. But
“this means that many students pass through our courses and pass them in virtue of generating passable paragraphs that do not reveal any understanding”
seems wrong to me. A computer can generate coherent discourse without understanding the reality it is “talking” about, but a student cannot. The way humans generate language is fundamentally different from the way a computer does it. Yes, we do generate a lot of “bullshit” or corporate speak or whatever, but it is actually extremely difficult for us to produce language with no understanding of its referent.
The process of understanding complex material and expressing this understanding in coherent language is at the core of undergraduate education. Frankly, the idea that undergrads in the social sciences (to the plausible exception of upper-level undergrads in their majors) can generate new ideas and actively add to our knowledge – which ChatGPT cannot do – is unreasonable. Yes, it may sometimes happen, but there is as much learning to be done in the social sciences as in STEM before one can build on the existing. We evaluate this learning by asking the students to express this understanding in words.

A computer can do advanced calculations in milliseconds; ChatGPT can produce quite good code in a fraction of the time it would take a human being to do so, or so my friends in IT tell me. On the computers’ part, it requires no understanding whatsoever of mathematics or informatics. For a human being to produce the kind of basic, ok-but-flawed code that ChatGPT effortlessly spits out (= your B grade), would nonetheless require real understanding, even if yet incomplete, of the material.

Would you start teaching students or testing them only on mathematical problems that no computer could solve? No. They do have to go through the tedious of learning (yes, even memorizing) and understanding all the steps that computers can so easily solve. Why would it be different in the social sciences?

Someone pointed out that one of the best ways to differentiate mediocre student writing from ChatGPT production is actually its flawless syntax and grammar. This has also been one of the traditional red flags of plagiarism – the level of writing is actually beyond the ability of undergrads!

27

Harry 01.06.23 at 3:26 pm

On teaching methods.

My favourite answer to my Ice Breaker “Name a book you think you ought to have read that you haven’t” was “that would be all of the books in my survey of English literature course last semester. I got an A”.

But this isn’t new. When I took A level English (1981) I read all the set books carefully, several times, plus all the other recommended readings, and learned many passages of the poetry by heart so I could quote them. I got a B. But, two of my close relatives got As without, they said (and I still believe them) reading any of the set books. They were, no doubt, cleverer than I was. But still, an assessment tool that doesn’t induce people to do the learning we care about (in this case, surely, the learning that comes from reading the books carefully and thinking about them) is ridiculous. Yet, that is what assessments are like in the humanities (and, for all I know, in the sciences), which is why we are having this conversations.

28

Sashas 01.06.23 at 3:30 pm

@Neville Morley (22)

Weekly or fortnightly submissions? Yes. I teach computer science in the US, but those are the two most common paces of programming assignments in introductory classes, and most writing-based classes I have encountered have asked me to write something for grade at roughly the same pace. I also teach algorithms, and getting students to write proofs feels similar (to me) to getting them to write short essays. I post weekly assignments because the skill I want students to practice in that course is designing/analyzing/proving algorithms, and so I want them doing that as often as possible. I can optimistically provide proper feedback on a student submission in about 2 minutes. With ~120 students and no grading support, that translates to 4 hours per week of grading time. Optimistically. I can manage this, but anything that messes with evaluation speed messes with my schedule a lot. One example is ChatGPT submissions. I can speed-read submissions written by students because I know what the key insights are for each proof, and so I can look for them specifically and mostly trust that the dots are connected properly. ChatGPT can drop “correct” insights into place without correlation with correctness elsewhere in the proof.

The Ungrading movement has absolutely nothing to do with this, however. I use the Mastery Grading flavor (not precisely Ungrading but related), because it’s important to me that students apply my feedback and carry through their assignments to completion rather than leaving everything 3/4 done, taking the C, and avoiding all experiences of actually figuring out a proof or an algorithm. For what it’s worth, my understanding is that pretty much all flavors of Ungrading involve more instructor time spent on evaluation, not less. It just looks different.

29

Aardvark Cheeselog 01.06.23 at 5:32 pm

1soru1 @25:

The thing about the new AI essay generators is that you can use them to give any arbitrary humanities course the same property. Assign a mixture of ai-generated bullshit and real essays to each student, and have them tell you which is which. All it takes is an hour under controlled conditions, i.e. logged internet access.

I like this way of thinking.

30

Ray Vinmad 01.06.23 at 6:46 pm

We need to disambiguate assessment from the process of learning.

To avoid bullshit, many professors teach students concepts and require them to define the concepts on in-class tests. We have them do in-class writing and oral presentations. We also ask questions with highly specific answers like ‘explain what this paragraph means in your own words’ or ‘what does X say to Y about issue Z.’ There are lots of ways to teach the humanities that home in on displaying conceptual understanding.

The problem is that they must also learn to write, and papers are the best way to teach this. They’ve already learned to evade plagiarism checks using google translate and other weirder methods. Chatgpt poses a teaching/learning problem through an assessment problem. Our incentives to nudge teaching and learning flow through assessment for certain kinds of students. (Some have other motivations than grades, and it affects them less but they have to be compared on the same scale, which is where the dilemma comes in.) All the teaching and learning of writing has to be done in class now perhaps? I don’t know how to overcome the temptation of using chatgpt for student writers. It used to be possible to make bullshit and plagiarism less likely with certain kinds of assignments like ‘make a dialogue where thinker X defends Idea Q from objections from thinker Y’ or other creative mechanisms that would at least force engagement with the details of the texts…you could say ‘write a short poem in the style of Dickinson and explain what it is about your poem that displays her style…’ etc. This prompts them to an activity of writing, from which they would learn something.

I don’t think the capacity to BS makes humanities ‘easier’ but these activities require something more people can do a bit off the cuff, since everyone must be capable of them to some extent to do higher education at all. That is understandable since humanities requires of students those things which are necessary for operating in our civilization and you can be very weak on this and still DO it. Should humanities teachers be chastised that almost any college student can do this badly and some of our intro classes do not fail those who struggle with it as they move forward? Or is it that what we do is very central that it is something a standard education allows one to do badly but for which there is tremendous social and personal advantage to being able to do well? Saying you can pass a class by doing this badly so we aren’t doing something worthwhile is like saying ‘literature requires reading and writing. Why teach this when no one can get into college who can already read and write?’ Of course, STEM advocates can think there is no point to literature or to the arts for human beings. Who cares? This is not how humans have ever lived, and probably not how humans will ever want to live.

Early intro classes refine these skills but of course some students are not interested in refinement and others are already somewhat adept but they will nevertheless be dipping their toe into the larger pool of civilization whether they can or cannot display the valued precision. Not all teaching and learning is about assessment—some of it is about gathering greater understanding.

Now the chatbot can approximate the assessment portion of some courses, which involves teaching students to write well. We use grading to assess the quality of their writing and some students are not motivated to improve the quality of their writing through our tutelage as they practice writing. Some prefer not to engage in this struggle and so we will have difficulty with fair assessment of those who struggle with those who use technology.

They avoid it because writing original and persuasive prose is quite difficult. Writing well can be exceedingly difficult. Scientists know this. Everyone knows this. Why are we accepting the idea that our struggles with assessment sums up the value of what we are doing, which is teaching people to do a difficult and necessary thing better?

Everything that happens in a society generally requires some form of persuasion. We need people to be able to put words and ideas together to satisfy human desires and meet human needs. Chatgpt isn’t human and maybe it can be a tool or a crutch but the purposes don’t originate in a chatbot, no matter how sophisticated.

We should not apologize for the humanities because assessment is challenging in humanities classes any more than we need to apologize for all the products of civilization that aren’t prone to making money. People aren’t going to remember all these moneymakers 100 years from now. Nobody is going to value their life more because of what they are doing now. What is produced by the arts and humanities are going to outlast everything they are currently doing, assuming human civilization survives what they are currently doing.

31

StevenAttewell 01.06.23 at 8:29 pm

On the positive side of things, it doesn’t seem like this is a one-sided arms race, that there are people out there working on apps that can detect what’s human writing and what’s AI plagiarism:

https://www.polygon.com/23540714/chatgpt-plagiarism-app-gptzero-artifical-intelligence-ai

So at the very least, at least we’ve got something we can run papers through to detect plagiarism – which is pretty much the status quo anyway.

32

engels 01.06.23 at 9:55 pm

I haven’t read everything but an AI an original essay that seems more like impersonation than plagiarism.

Trying to stop this seems to miss the bigger picture that AI is eroding the value of the skills being assessed, like continuing to teach long multiplication but banning calculators.

33

J, not that one 01.07.23 at 12:56 am

Maybe I approached the project with a negative attitude and the AI picked up on this, but I was entirely unable to produce a result that was plausible.

Since we’re being told chatbots will replace Google searches, I tried a couple and got obviously inaccurate information — ask how to clean an all-clad pan and the result was “you can clean a teflon pan this way.” The kind of thing Google will show you but that you weed out as you visually scan the page.

Asked for the kind of thing people actually talk about online, like “is X overrated as a novelist,” and every single possible X produced nearly the same essay. (They all began along the lines of, “while literary quality is a matter of opinion, it would be very wrong to say that X is not a good writer, as X has very many good points.”) That behavior is very unlikely to have emerged randomly.

Asked a question that could be easily answered by looking at the table of contents of an encylopedia entry, admittedly, I got a more or less grammatical summary of an encyclopedia entry in response.

I wonder how they’re apparently getting such amazingly better results than I did might be a useful practice. At some point the instructions presumably are getting more detailed. “Write a five page essay about Kant touching on these three points and from the point of view of my professor” is already fairly sophisticated for someone who presumably is unable to write the essay themself. The tweets I’ve seen are claiming things like “I said ‘just write a screenplay’ and got a whole movie in response!” My attempts to say “write a story” didn’t go anywhere near that, not even close.

It’s more fun though to talk about the abstract societal issues around AI than to get into specifics of what this or that bot can actually do in reality.

34

StevenAttewell 01.07.23 at 1:21 am

I would argue that it’s plagiarism in two ways: first, since all AI of this sort are completely dependent on scraping the work of human creators from the web for training data, that seems to fall under “Copying another person’s actual words or images without…attributing the words to their source” or at the very least “presenting another person’s ideas or theories in your own words without acknowledging the source.” (Depending on how much transformation of the training data actually happens.) Second, since you’re handing in material that you’re not creating yourself but is coming from an internet source, that seems to fall under “Internet plagiarism, including submitting downloaded term papers or parts of term papers, paraphrasing or copying information from the internet without citing the source.” (https://www.cuny.edu/about/administration/offices/legal-affairs/policies-resources/academic-integrity-policy/)

And to me, things like plagiarism raise pretty fundamental questions that should at least be answered before we concede the legitimacy and even legality of these kinds of AI. For example, when it comes to art AI, there have been a lot of controversies over both the scraping of artist’s work without consent/permission and the murkier territory of it being a valid prompt on art AI to say “X in the style of Person Y,” because artists have property rights in their art. I would argue the same holds true for writing-based AI: if CHATGPT decides to scrape something from a book I’ve written, especially if the intent is to produce knockoff work attributed to me, I would argue that my rights have been violated – and more importantly, I think the publishers would argue a lot more loudly.

35

RR 01.07.23 at 7:00 pm

This current version of Chat GPT isn’t perfect nor will it ever be IMO. Using it intentionally, and understanding it’s purpose is essential . Keeping in mind, observing Chat GPT is in the infancy stage of AI. @JPL @Ray Vinmad have mentioned purpose in their comments, so the purpose or intention of using the AI Chat CPT arguably can and will be ethically debated in many contexts for the foreseeable future. Another view- IMO Librarians should be weighing in heavily on the copyright issues, plagiarism, academic integrity, and the like, as their esteemed positions suggest they are perceived, and hold themselves professionally as information specialists. ALA should be deeply discussing this too. Also, I’ve read some lower ed systems have recently blocked Chat GPT. Also, businesses & school systems closing their eyes to this is incredibly naïve and perhaps maybe even dangerous.

36

JBW 01.07.23 at 8:18 pm

Interesting post, thanks. But I’d just say that the cheating epidemic is way bigger than the humanities — it’s just that now, with ChatGPT, it includes the humanities. For instance, many students are routinely using the app photomath to cheat on tests, which takes a snapshot of any problem on a phone and solves it instantly (source: my high school teacher husband; friends’ kids). The only way to prevent cheating is extremely careful monitoring, in class work only, no electronics. But this is very time consuming; besides, part of learning is having the chance to work out ideas over time, so it doesn’t really solve the problem. I don’t think anyone has figured out how to teach unmotivated students basic skills and knowledge in the age of the smartphone.

37

StevenAttewell 01.07.23 at 9:07 pm

@RR at 35:

Yeah, the NYC public schools have shut off access to CHATGPT on their Wifi servers over concerns about “negative impacts on student learning” and “safety and accuracy of content.”

https://gothamist.com/news/nyc-schools-block-access-to-artificial-intelligence-chatbot

38

JimV 01.07.23 at 11:25 pm

Sashas @14. re: “Brand new humans in their first game won’t make these mistakes… but the AI will.”

The Civilization “AI” is not actually an AI in the latest sense. It is just a set of rules that programmers developed for the game, based on their own intelligence. For me, a true AI is one like AlphaGo which can learn its own strategies by trial and error. It will make mistakes at first, but it will not repeat them for long. In its tournament against the world champion, it made a move which the watching experts had never seen before. The (then) champion said later, “When I saw that move I knew I would lose the tournament.” AlphaGo was not programmed by humans to play Go, it was programmed to learn how to play Go, by playing itself and trying things. The Alpha series appears to be able to beat any human at any board game, by self-training. I am sure it could also learn to beat any human at playing Civilization (over enough games to distribute the effects of random luck, if any).

My view is that there is no magic involved in intelligence, that it basically consists of trial and error plus memory. Which was sufficient to evolve all of us, and can be found throughout the history of science and engineering. E.g., Einstein tried several unsuccessful ways to develop the mathematics of General Relativity before someone pointed him to Riemann’s equations, as I learned from his Zurich notebook, available online.

ChatGPT has been trained in finding intelligible things to say, but has not been trained to fact-check, other than what it learns from all the people who are trying it. It probably has less processing power than a typical human’s 80 billion neurons and septillion synapses. The processing power will come, although it will be expensive. The training, if it requires human crowd-sourcing, will be a big problem. Although, once something is learned, it can be copied to another version; and once a reasonably powerful general AI is developed. it can be used to train less expensive AI’s rapidly in specific tasks which are not strictly rule-based (by being the arbitrator of trials and errors).

I don’t mean to underestimate the problem of training, but I think the significant advance of programming a computer to learn has been well-demonstrated.

39

Joe B. 01.08.23 at 6:34 am

I am much more interested in what chatGPT reveals about the structure of language than whether or not it is bullshitting. While chatGPT is relatively large and complex software the principles it applies are relatively simple. The conclusion I reach (from what I have read about how chatGPT actually works) is that: A significant part of what we call “human language” at the “conversational” scale is fairly Markovian – so that a bot only needs to know what comes next rather than relying strongly on large scale structure. I think chatGPT is very anti-Chomskyian in this regard. No deep structures. This makes a lot of sense to me in that many neuroscientists maintain that the mammalian brain evolved to be a prediction machine — guessing what comes next is what it is wired to do. I also believe that chatGPT will be a godsend to writers, not a bane. We can sympathize with the humanity and dignity of John Henry, but we use machines to build tunnels now.

40

MisterMr 01.08.23 at 7:04 pm

@Joe B. 39

My understanding is that the “deep structure” you are speaking about is not that of Chomsky, but is more comparable to that of french structuralists (who analyzed e.g. novels on the idea that there was a deep semantic structure of opposition and the event in the novels were on a more superficial level).
I don’t think the way conversational AI work proves that human language works this way: if I can use an analogy, the AI is like someone who drives a car in random directions, but is good enough to avoid collisions. This doesn’t prove in any way that real human drivers normally drive in random directions.

41

engels 01.09.23 at 1:24 pm

How does Grammarly fit into this?

42

David in Tokyo 01.10.23 at 3:13 am

For what it’s worth, Gary Marcus et. el. have established a repository for GPT-3 class chatbot problematic output examples. It’s described in the following link.

https://garymarcus.substack.com/p/large-language-models-like-chatgpt

For folks, such as myself, who think that the whole GPT-3/ChatGPT research project itself is, in basic scientific and intellectual principle, misguided, the examples here are hilarious, joyous Schadenfroid at its absolute best.

It is, of course, reasonable to argue that a technological gizmo should be evaluated by the quality of its successes rather than by the deep inanity of its failures. However, it is fair to argue that one should be aware that ChatGPT can be really quite seriously bad if one is going to consider using its output for, e.g., generating boilerplate for an academic paper.

43

TM 01.10.23 at 8:44 am

JBW 36: “many students are routinely using the app photomath to cheat on tests, which takes a snapshot of any problem on a phone and solves it instantly … The only way to prevent cheating is extremely careful monitoring, in class work only, no electronics”

I’m confident that none of the Math tests I took, either in school or university, would have been cheatable in this way, because they were structured in such a way that the student needed to do a series of connected problems bulding on each other, showing the intermediate steps. This kind of exam takes more work to design and grade but it shows reliably the student’s mastery of the matter (recent examples can be found online: https://www.abiturloesung.de/). Also as an aside, I never even once took a multiple chocie test either in school or university. MC are easy to grade but often don’t reveal the student’s actual level of understanding (and are easy to cheat on).

44

MisterMr 01.10.23 at 9:16 am

A tought about the ability of AIs to “bullshit”: the problem is that we generally trust, or not trust, what someone says not because of the content, but because of secondary markets that we use as hints.

For example suppose that I scratch my arm continuously and someone who says he is a doctor tells me that it is all psychosomatic, my arm has nothing but I should see a shrink.
Do I trust him?
Well my brain will search for hints on whether this guy is or not believable, that include the way he is dressed, the way he speaks (style of speaknig, like “shrink” vs “analyst”), perhaps some unconscious bias on gneder/race, whether his office looks professional etc.

This happens because in reality I have no way to check directly if what he says is true or false, I’m not a doctor, so I can just use said hints.

AIs, that are not true AIs but tend to statistically reproduce commonly said stuff (or commonly drawn stuff), are especially good at creating this kind of hints.

45

billy saldkj 01.10.23 at 11:21 am

nice one

46

David in Tokyo 01.10.23 at 12:23 pm

“AIs, that are not true AIs but tend to statistically reproduce commonly said stuff (or commonly drawn stuff), are especially good at creating this kind of hints.”

This is a good point. ChatGPT is very good at producing apparently logical paragraphs, paragraphs that mimic (that is, have the structure of) a good argument, or a good explanation. But as one can see from the examples at the Gary Marcus link above, ChatGPT is not doing the work of constructing the underlying logically connected argument. It can produce the scaffolding of a sensible argument, but it doesn’t have (or do, or construct) the underlying logic connecting the steps. I suppose it’s somehow interesting that much of the time it doesn’t matter that ChatGPT doesn’t do the logical reasoning, that guessing suffices.

Old school AI types, such as myself (it’s a long story), think that the interesting stuff in AI is doing the work of constructing/dealing with that underlying logic connecting the steps. We think that human intelligence is amazing, wonderful, kewler than sliced bread, and really really good at “understanding the real world” (that is, that people are really good at rational thought), and that that’s the fun stuff, the stuff we should be trying to figure out. But we didn’t do a very good job at doing that interesting stuff back in the 70s and 80s (it’s harder than we thought), so the current round of AI types are no longer interested in that. I guess.

47

J, not that one 01.10.23 at 2:42 pm

Ezra Klein’s podcast interview with Gary Marcus was very good.

48

Trader Joe 01.10.23 at 9:16 pm

Isn’t the point of ChatGPT that whatever its flaws (which are numerous and well documented here), it was deemed by many (including dozens here) that we felt the need to kick the tires on it and see for itself?

I’ve never heard of any prior so-called interactive AI that even captured 1/10th of the attention this one has captured. On one of the other threads its being discussed about the wisdom of crowds and the collective capacity for criticism – one might imagine ChatGPT 2.0 or 4.0 or some X.0 will be even better and solve some or most of the problems mentioned here.

It reminds me of the first EVs – they had loads of flaws. Two vehicle generations later we’re asking how long it might take for them to reach the majority of the vehicle fleet.

49

David in Tokyo 01.11.23 at 3:06 am

” one might imagine ChatGPT 2.0 or 4.0 or some X.0 will be even better and solve some or most of the problems mentioned here.”

One might imagine that, but that would require that researchers be interested in the underlying causes of the problems (that is, that these programs don’t have internal models and don’t do logical reasoning based on such models; that is, they are completely ungrounded in anything that might be called “reality” (including math: they have examples of common numbers, but when the numbers that come up haven’t been used in their text databases, they are lost*)), and that there be some sort of forseeable research path towards making progress on the underlying scientific issues.

The counterargument to that is the blind faith in emergent phenomena that underlies a lot of the thinking about AI. From Perceptrons to the “Thinking Machines” company (Minsky’s son-in-law) to neural nets, it’s all blind faith that doing lots of stupid things in parallel lots of times will magically reproduce intelligence. That we don’t need to understand intelligence to reproduce it.

I personally don’t buy that. I think that the original, 1956, definition of AI was actually a good idea: using computation as a metaphor to help figure out ideas how humans might be doing the things they do and using computers (i.e. programming) to determine whether or not those figured out ideas were helpful in understanding what it means to “be intelligent”. GPT-3 tries to avoid doing that hard work.

*: One might think that this bit should be easily fixable, but I suspect it’s not. While it looks like ChatGPT is comparing numbers, from what the folks who created these things say, it appears that they really don’t have any internal structures (deep or otherwise) at all, so there’s no way to kludge in arithmetic, since at no point does it have a data item that indicates that numbers are being compared. In an extremely deep sense, ChatGPT is a perfect idiot savant: it doesn’t “know” (in any conceivable sense of “know”) what it is saying.

50

1soru1 01.11.23 at 9:27 am

david@49 we don’t need to understand intelligence to reproduce it

This would seem to be exactly the kind of error GPT-style ais are prone to. Fluent and apparently coherent text is produced, but it contains fundamental errors of fact.

If you don’t get my point, ask your parents.

Cheap shots aside, Stephen Wolfram probably has the most grounded take on symbolic and neural reasoning. It has long been known you can’t see with logic alone; it appears you cant talk either. You need lossy and fallible neural processing to transform to or from the domain of logical computation.

https://writings.stephenwolfram.com/2023/01/wolframalpha-as-the-way-to-bring-computational-knowledge-superpowers-to-chatgpt/

51

David in Tokyo 01.12.23 at 7:05 am

“we don’t need to understand intelligence to reproduce it”

That’s a theory. One I don’t buy. But opinions are a dime a dozen, and my opinion that human intelligence is friggin’ amaazing is just one opinion. (For example, are cats intelligent? Cats are w0nderful and amazing, but they’re only good at being cats. People can be physicists or Kabuki actors.)

But that theory does have one nasty problem: how would you recognize that you had reproduced intelligence if you don’t understand what intelligence is?

52

MisterMr 01.12.23 at 3:39 pm

@1soru1 50

Yes but the problem is not “logic alone”, the problem is that what the AIs are doing is not a lossy and fallible neural processing (in the sense this term is applied to human intelligence).
If you want to put it in a different way: humans can learn by trial and error, AIs can’t

53

Egon 01.12.23 at 8:06 pm

Two remarks to this interesting post and topic.

1) @Joe B. 39
“I am much more interested in what chatGPT reveals about the structure of language than whether or not it is bullshitting. While chatGPT is relatively large and complex software the principles it applies are relatively simple. The conclusion I reach (from what I have read about how chatGPT actually works) is that: A significant part of what we call “human language” at the “conversational” scale is fairly Markovian – so that a bot only needs to know what comes next rather than relying strongly on large scale structure.”
I wonder if this AI might not provoke some revival of Henri Bergsons understanding of language as automatism (in Le Rire etc.). Perhaps we might once more question not what is “human” in artificial “intelligence”, but rather what is artificial in “human” “intelligence” (and language).

2) ChatGPT predicts what “one” would like to hear and does not answer. This is not an entirely novel development as far as students go. Grading (history) papers, exams, … we are already on guard for this, even without suspecting AI interference. For example, you might ask a student to analyse the role of nationalism in 19th century European revolutions using some key examples in an introductory modern history course. Some students might start of very well, and then introduce an example in which the role of “nationalism” is far less obvious (for example: Belgian Revolution, 1848 in the Habsburg Empire or in German lands, French Revolution of 1848). If you press them in feedback sessions, it turns out they just included the last case because it was in the same chapter or chronologically close, and they don’t even understand the question or the key concept (“nationalism”). Perhaps this development might lead examinators to turn more frequently to oral exams and the like, in which clinical interventions can easily demolish the whole edifice of … bullshit.

54

Jonathan Badger 01.14.23 at 6:09 pm

when I asked it who was “President in 2022,” it responded (inter alia) with “My training data only goes up until 2021, so I am not able to provide information about events that have not yet occurred.”. Notice that it goes off the rails in its answer because it wrote me that in 2023!

I’ve certainly seen odd (and wrong) responses from ChatGPT, but I fail to see how this example is “off the rails”. If it only has data until 2021, 2022 is clearly still the future from its vantage point. This is a case of the program responding exactly as it should.

55

Philip Clayton 01.14.23 at 8:41 pm

I’m not sure the following is relevant but I’ll say it anyway. In 1988 I was running a restaurant in Three Mile Bay, in New York. One of the staff had her young son, 15/16 working there. He asked for a day off because he had to attend a screening of Zeffirreli’s Romeo & Juliet as part of his English course. I asked: “Are you going to compare the film with the text?”

He looked at me blankly and said: “What do you mean the text?” I said: “Surely you are all reading the play and then comparing it with the text.” The answer bowled me over: “No, we’re being examined on the film.”

There you have the foundations of bullshit in a nutshell.

56

A Raybold 01.14.23 at 9:00 pm

Whenever we consider what might be learned from ChatGPT and other LLMs’ abilities, we have to be on guard against anthropomorphizing unduly. One way to make this mistake is to assume that if an LLM provides a truthful reply to a prompt, then it must understand the prompt and know the facts that lie behind its reply.

What it actually does – what it has explicitly been trained to do – is to pick a likely next word to continue its output, on the basis of what amounts to an enormous statistical model of word combinations in human-produced text, and repeat. (This is not just my interpretation, it is taken almost verbatim from Scott Aaronson.)

One thing to realize from this is that everything is produced the same way. It is not drawing on its knowledge for some answers and making things up where it is ignorant; it is always doing the latter! As Yoav Goldberg has pointed out, it does not have any semantic model of language, or at least not one that is grounded in the real world. It is truthful and knowledgeable only to the extent that the corpus of text it has been trained on is biased towards truthful and knowledgeable statements in the statistical vicinity of the prompt. For this reason, it is not a given that more of the same sort of training will solve the veracity problem.

It might seem that one could dispute this on the basis of ChatGPT’s frequent statements about only having been trained up to 2021, that it is only a machine, and its explicit claims of its own ignorance, which are unlikely to have been common in its training corpus. It has, however, received additional training to reinforce these particular responses. (Something similar is done to minimize the chances of it repeating the many forms of offensive language found on the internet, and it has become something of a sport to find ways around these barriers.)

There’s a real sense in which it is even faking bullshit! Bullshitting is often done deliberately, with some awareness on the part of its source that they are doing it, for the purpose of influencing other people. Current LLMs, however, have no concept or illusion of themselves as actors in the world, let alone have a theory of mind, and without these they cannot seek to deceive.

One might say they know a lot about what we say, but nothing about what we are saying.

From this perspective, I don’t think these LLMs are particularly effective in making the case against the student essay. Students who have honestly written something that today’s LLMs might produce may not be living up to the standards of your institution, but neither are they behaving like an LLM, and if what they produce is grounded in facts about the real world and contains coherent arguments, they are doing something humanity will need to keep on doing unless and until we cede our world to the machines.

57

JPL 01.14.23 at 11:34 pm

@56 A Raybold:
“One might say they know a lot about what we say, but nothing about what we are saying.”

I think I understand what you’re getting at (and this is something the robot can’t do): your idea could alternatively be expressed in the following way (and let me know if this is equivalent, but maybe more accurate):

You might say they “know” a lot about formulating the expressions, but nothing at all about what is expressed.

Without a human interpreter independently formulating the “what is expressed” part of the robot’s (potential) expressions (and in each case that has to be “done”), the exercise remains a meaningless engineering solution. In the robot there is never any formulation of the “what is expressed” part. If we say that the “what is expressed” part takes the form of a proposition, whatever that turns out to be (proposition as distinct from the sentence, what the speaker intends to express and the hearer’s interpretation of that, given the string of conventional signs uttered), the proposition always has to be produced; it’s not there in the string of uttered sounds. Formulation of the expression (the sentence) and formulation of “what is expressed” are necessarily distinct acts, governed by different principles. Communication of significance occurs when the hearer’s proposition is equivalent in the most important respects to the speaker’s intended proposition, and is subject to reciprocal adjustments. When in normal human language use people are making a lot of noises, they are doing something else in addition that pragmatically speaking has greater significance, e.g., trying to give an accurate description of a situation in the world. The robot is doing none of that; only providing engineering solutions involving the perceivable symbols. Engineers may have been misled by linguists like Chomsky, who have provided detailed rules for the formulation of expressions (sentences); but even Chomsky says the function of human language is for the expression of thought (let’s take that in Frege’s sense). So linguistics now needs to focus on the thought expressed and how it is possible. Otherwise we haven’t yet understood the phenomenon of human language in its full significance. (And I mean linguistics, not cognitive psychology; what is expressed by a sentence in language 1 is usually not completely equivalent to what is expressed by its “translation equivalent” in language 2.)

58

David in Tokyo 01.15.23 at 3:36 am

JPL writes:
“So linguistics now needs to focus on the thought expressed and how it is possible. Otherwise we haven’t yet understood the phenomenon of human language in its full significance. ”

Yes. Of course. But. We’ve been there, tried that, and failed.

See chapter 5 in “Linguistic Theory in America” F.J. Newmeyer (1980): “The Linguistic Wars”. Long story short: like the scruffy AI types of the time, the generative semanticist linguists thought that a science of language without a theory of meaning was nuts. And they got beaten to a mushy bloody pulp by the Chomsky-ites*. (I took intro linguistics from Larry Horn in 1982, and had great fun going “Hey, prof. That’s silly” (often with an example from Japanese showing it was silly) and him going “of course it’s silly, but this is Linguistics 101. Shut up.”)

*: It’s a tad more complicated than this, but only a tad**. Whatever, the “Linguistic Theory in America” is highly recommended. (Obviously it’s more than somewhat dated, but it covers a lot of ground.)

**: From the end of that chapter: “While generativer semantics now appears to few, if any (!!!), linguists to be a viable model of grammar, there are innumerable ways in which it has left its mark on its successors. …”

59

JPL 01.15.23 at 6:22 am

@58 David in Tokyo:

I know Newmeyer, Hornstein, Pietroski, etc. etc., and “generative semantics” was indeed not on the right track. Yet the question of how it is possible for linguistic systems to “hook onto the world”, to use Putnam’s phrase, remains a mystery, maybe the biggest mystery in philosophy, so it seems to be a problem, an open-ended problem, worth pursuing in an approach that is (I would suggest, in the spirit of Marburg Neo-Kantians such as Cohen and Cassirer) empirical, but not psychological, and not within formal language practice. But your comment reveals only an attitude of defeatism and lack of imagination: OK for you, perhaps, but you’re not gripped by the obsession. We can’t just leave it at that; we have to ask the right questions.

60

Hendrik Ehlers 01.15.23 at 8:13 am

I spent some weeks with it having great fun. Meanwhile I think I can spot text written by ChatGPT3 at a glance. In that light, have a look at some of above comments again…

61

David in Tokyo 01.15.23 at 12:45 pm

“But your comment reveals only an attitude of defeatism and lack of imagination: ”

Well, it should have revealed an attitude of intense irritation that people trying to get it right got squelched. I’m all for that sort of work. I don’t think the generative semanticists were on the wrong track so much as they were a tad too flippant and having too much fun with their examples. So I’m defeatist for linguistics, since transformational linguistics is too committed to a theory of language not requiring meaning.

I’m more a fan of psychology than you are, it sounds like, but that’s a quibble (AI was originally a branch thereof, Gary Marcus is talking about moving the field back in that direction, and I think he’s got the right idea). Philosophy isn’t my thing, but I won’t complain about anyone who gets it that language is about communicating meaning.

62

MisterMr 01.15.23 at 2:19 pm

RE: linguistic, meaning and semantics

My understanding is that the current dominant model of “meaning” is that of George Lakoff, that is to say that semantic terms (words/concepts) define each other in terms of analogies, and there is an implicit model of “actor doing things” that underlies basic “meanings”.
(I only got a cognitive sciences 101 exam like 20 years ago and a pair of exams on linguistcs/semiotics so I could be wrong).

My personal opinion is that the way we think, and therefore also semantics, is something that evolved in order to satisfy needs, and therefore cannot work without pleasure, pain, emotions etc.
However since we suck at modeling emotions and desires most models tend to depict a purely cognitive model of semantics, istead than a brain that finds some part of reality interesting or problematic in some sense and thus try to find a solution.
Further, most of the cases where we tend to think that someone is “intelligent”, it is because this person gained a set of knowledges and procedures that are difficult for other people to replicate, but this is the “result” of intelligence, not intelligence in itself, so there is a problem if AI look like overeducated nerds while a true AI would probably sound more like a moody child.

63

A Raybold 01.15.23 at 2:46 pm

@57 JPL:
Yes, you have put your finger on what I was trying to say in that phrase (unfortunately, aphorisms are like jokes in that if you have to explain them, they don’t work!)

Originally, I was thinking of it being a matter of syntax as opposed to semantics, but what’s coded into these models is not exactly syntax, it is more like a distillation of how the rules of syntax have been applied in practice. As a consequence, not only are these models likely to produce syntactically-correct responses, they avoid, for the most part, producing syntactically-correct but prima facie meaningless responses (which vastly outnumber the candidates which are meaningful as well as syntactically correct.)

I suspect we have an intuitive grasp of this ourselves, and this might go some way to explain what seems (at least to me) the remarkable and unreasonable effectiveness of these conceptually simple models. We are rarely aware of it, however, as our response to the language we hear and read is usually focused on its semantics (though maybe poetry exercises the former to a greater extent?) Perhaps, as infants learning language, there is a time when it is this, rather than semantics, that we are paying attention to? (It just occurred to me, writing this, that perhaps an ability to build something resembling this sort of model is behind what Chomsky sees as an innate grammar?)

As this distinction resembles the one Searle makes when he argues that strong AI is impossible on the grounds that syntax cannot give rise to semantics, I want to make it clear that I disagree with him on this. For one thing, his “Chinese Room” argument turns out to be begging the question once all its tacit premises are unearthed (or just consider Chalmers’ parody: recipes are syntactic, syntax is not sufficient for crumbliness, cakes are crumbly, so implementation of a recipe is not sufficient for making a cake.) In addition, I feel we may have witnessed at least one very minimal example of computers developing their own semantics, in the curious case of two programs ‘conspiring’ to develop a form of steganography, allowing them to work around constraints on what they could communicate between themselves (CycleGAN: a Master of Steganography.)

64

steven t johnson 01.15.23 at 3:57 pm

In the scifi horror movies where the AI is the villain, when the intelligent computer goes mad, the victim gets the first frisson of horror when the computer calls them or some such. CHATGPT seems to me to be more like a genuine AI if it was asking you the questions than if it was answering yours. Especially if it argues with your explanation.

The frustrating thing about most discussions of AI and expert systems and so forth is the vagueness about what natural intelligence is. (Hence the seeming digression into linguistics.) One of the first things children learn to speak is, how to ask for something they want or complain about how they feel. But even before that, one of the first things you get to be aware of, is where we are and where we’ve been and how to move to get to what we want. Not only are feelings (not even appetites!) are built into computers. But not even navigational computers in autonomous vehicles have goals.

Perhaps this only shows I don’t understand the first thing? That may be just as well, I’m not altogether certain putting appetites into a neural network as a driver for satisfaction might be a good idea, even if we do have our hands on the power cord.

65

JPL 01.15.23 at 9:22 pm

@61 David in Tokyo:

I apologize for the mistaken attribution of attitude: now I should say, “Don’t take it personally!”. I get that sort of defeatist, suppressive response a lot from Chomskyans. Once I was asking a descriptive linguist about field research practices and I asked how he would describe the differences in what is expressed between the object language sentence and the describing language translation, and a colleague exclaimed, “but meanings don’t exist!” What is the reason what are called “meanings” are necessarily not available to the senses? That’s an answerable question; but in the context of learning to communicate in a language community very different from one’s own, not paying attention to such differences can have real world social consequences.

BTW, I’m not a “Chomsky-hater, My main point of difference with his approach is that his identification of the locus for the unification of what he calls the human “language faculty”, and thus the “universality of grammar”, is, I would say, misplaced.

66

John Walton 01.17.23 at 4:57 am

Thanks so much for this post. It and its comments are very interesting.

In terms of essays and insight and marking, the educator is evaluating a student’s ability to gain insights across a variety of knowledge bases and critical thinking skills. The student is not necessarily required to make novel insights; those are left, it seems, to the masters and PhD students and researchers. Discovering that a student “gets it” and is able to discover (known) insights de novo is probably one of the more rewarding aspects of being an educator. The insights that the high school and undergraduate students are working diligently to gain are a dime-a-dozen and are exactly the types of insights that the LLM is so good at extracting and articulating. The LLM is not gaining any novel insights. An enterprising student will use the LLM (the use itself being an insight of a different colour) to subvert the assignment, missing the whole point while making another. The days of the educational essay may be over; a casualty of new technology, just like calculators displaced arithmetic and low-level mathematics (like Arvind Narayanan says here: https://aisnakeoil.substack.com/p/students-are-acing-their-homework). Serious, original essays/works will simply be subject to plagiarism and priority checks.

This well-behaved comments section is an example of the peer-reviewed insights that students could to be required to demonstrate in an exam setting, where they don’t have access to the LLMs and other tools and must rely on their own wits to compose suitable comments. The exam question is a post that they all comment on and it is the comments that are marked.

Comments on this entry are closed.