Large Language Models in the Classroom
In the fall 2023 term, Professor of Philosophy Frederick Eberhardt, a faculty affiliate of the Tianqiao and Chrissy Chen Institute for Neuroscience, took the somewhat controversial step of allowing his students full use of large language models (LLMs) like ChatGPT in their written assignments. His Ethics & AI class not only wrestled with ethical concerns raised by artificial intelligence but experimented with the use of text-based generative AI to better understand how these tools work and what a future world that includes them might look like.
"You are fully responsible for the correctness and appropriateness of any language or images created by a generative AI tool that you choose to use," Eberhardt's syllabus reads. The only proviso Eberhardt added was this: "I require that you complete a 'Generative AI Memo' for all graded assignments. This memo requires you to tell me which tools you used, how, and why. If you decided not to use these tools, it asks you to briefly explain why you did not."
Eberhardt's experiment with LLMs follows a major upset in college courses caused by the release of ChatGPT, the first LLM to hit the market, in late November 2022. Instructors were suddenly confronted with the possibility that the sixth sense they had cultivated for sniffing out plagiarism would be utterly inadequate for detecting papers written by LLMs. No small degree of panic set in, followed by a cascade of opinions. Some instructors forbade the use of LLMs and relied on their university's honor code for enforcement, others invented new assignments that would be tricky for LLMs to master, still others returned to in-class writing with pen and paper to assess students' understanding of course material.
Eberhardt lands on the side of cautious optimism toward incorporating LLMs into the curriculum. "Is learning to write learning to communicate? Is learning to write learning to think? Does writing well help students read carefully? Or, is writing instruction about finding and developing one's voice?" he asks. In his Ethics & AI syllabus, Eberhardt articulates his own instincts about the value of writing. "We teach writing to develop our reasoning skills and to learn how to communicate effectively. In that sense, learning to write is developing a means to achieve broader goals; writing (at least in my teaching and research) has never been an end in itself. I consider it to be an open question whether these new tools can still help us achieve these broader goals, possibly much more easily."
A class in Ethics & AI seemed to Eberhardt to be a particularly apt place to experiment with LLMs. Prior to the advent of ChatGPT, Eberhardt's Ethics & AI course focused on questions of fairness and privacy with respect to the use of machine learning, and the fall 2023 term was no exception. "Assignments consisted of several 500-1,000–word subsections of a term-length project that culminated in a policy proposal for the regulation of generative AI," Eberhardt explains, "but students were now permitted to use LLMs in writing these assignments."
At the beginning of the course, Eberhardt says, "students were certainly open to LLMS, but many of them didn't really know how to think about them, and rightfully so. I didn't know how to think about them." When students turned in the first assignment, Eberhardt got "the expected disaster, where, for the most part, I had to read generic ChatGPT-speak." At this stage of the game, Eberhardt was able to easily differentiate the output of LLMs and students' own writing. "Text from the LLMs was just a superficial dribble that skimmed over the surface of the problem. The sentences were grammatically correct, but the content was zero."
Over time, though, students adapted. They learned that typing an essay prompt into ChatGPT didn't return helpful results and began to experiment with other LLMs and combine and rewrite their output. "I started to see excellently researched case studies where I could no longer tell which parts were student written and which were machine generated," Eberhardt recalls.
By the middle of the term, Eberhardt says, "we hit a plateau with several students returning to doing more of their writing on their own, while LLMs were used now principally as sophisticated search engines." At the end of the term, Eberhardt "received several outstanding term projects, that, to the best of my judgment, were more than either the students or the machine could have done on their own. Students' papers included a wealth of material that we had not covered in class, and their arguments were not only insightful but well developed and well supported."
Not all students achieved this outcome, however. "Quite a few final submissions felt like the machine had taken over," Eberhardt reports. "It was lukewarm corporate speak."
In reflecting on their experience with LLMs, Eberhardt says his students liked having the option to use LLMs and said that they found LLMs particularly helpful in getting past "writer's block": Just having something on the page to start with helped them get their writing engines in gear, so to speak.
One study shows that over 50% of college students already use LLMs and that 75% say they will continue using them even if their professors or universities ban the practice. Eberhardt sees giving students the option to use LLMs and discuss their value in class as a natural next step not only in understanding this new technology but in probing what it can show us about why we write and in what circumstances we actually have to.
Eberhardt suspects that "integrating a large language model is going to be standard practice in the future. Just as Wikipedia was at first regarded as an untrustworthy source and is now widely used, we can expect the same to happen with LLMs. Of course, Wikipedia has problems, but it's not as though the encyclopedia on my shelf is problem free either."
Eberhardt likens writing to other tasks that used to be widely taught and in which individuals were expected to have expertise but have since been turned over to either machines or human specialists. "Long division is still taught in primary school, but when did you last do long division? Yes, it's important to learn some basics of numeracy, but we also want to simplify steps that we no longer need to take ourselves. If I say, 'What's 800 divided by 13?' and you say ‘500,' I should know that's not even in the ballpark. But the precise answer I'll get from a calculator. The question now, for me, is what's the equivalent for writing of having a sense of numeracy? How much traditional training in writing do people need in a world of LLMs?"
All of this is contingent, Eberhardt admits, on LLMs improving, but the extent to which this happens is anybody's guess. "I've been waiting for a self-driving car for 10 years, and it's still not here," Eberhardt says. "It seems that the last 5% of the technology is really hard to figure out. I can imagine that this will be the case with LLMs too: They will improve very quickly at first, but having them produce what we expect from academic writing—or other forms of writing, for that matter—is still a ways off."
One of Eberhardt's Ethics & AI students said that they valued the experimentation with LLMs they did in Eberhardt's class but still feel that writing well is a crucial skill. "I think within academia, people need to accept that generative AI is here to stay," they said. "But even if generative AI is able to effectively write papers or summarize information, it is still unable to make one understand content. Learning how to write effectively has taught me how to read and learn more effectively, and proper writing skills translate directly into better communication overall."