A new study from the Massachusetts Institute of Technology (MIT) Media Lab has raised concerns about how artificial intelligence tools like
ChatGPT may impact students’ cognitive engagement and learning when used to write essays.
The research, led by Nataliya Kosmyna and a team from MIT and Wellesley College, examines how reliance on large language models (LLMs) such as ChatGPT compares to traditional methods like web searches or writing without any digital assistance. Using a combination of electroencephalogram (EEG) recordings, interviews, and text analysis, the study revealed distinct differences in neural activity, essay quality, and perceived ownership depending on the method used.
Note: EEG is a test that measures electrical activity in the brain.
Setup for cognitive engagement study
54 participants from five Boston-area universities were split into three groups: those using only ChatGPT (LLM group), those using only search engines (search group), and those writing without any tools (brain-only group). Each participant completed three writing sessions. A subset also participated in a fourth session where roles were reversed: LLM users wrote without assistance, and brain-only participants used ChatGPT.
All participants wore EEG headsets to monitor brain activity during writing. Researchers also interviewed participants’ post-session and assessed essays using both human markers and an AI judge.
Findings on neural engagement
Electroencephalogram (EEG) analysis showed that participants relying solely on their own cognitive abilities exhibited the highest levels of neural connectivity across alpha, beta, theta, and delta bands — indicating deeper cognitive engagement. In contrast, LLM users showed the weakest connectivity. The search group fell in the middle.
“The brain connectivity systematically scaled down with the amount of external support,” the authors wrote. Notably, LLM-to-Brain participants in the fourth session continued to show under-engagement, suggesting a lingering cognitive effect from prior LLM use.
Essay structure, memory, and ownership
When asked to quote from their essays shortly after writing, 83.3 per cent of LLM users failed to do so. In comparison, only 11.1 per cent of participants in the other two groups struggled with this task. One participant noted that they “did not believe the essay prompt provided required AI assistance at all,” while another described ChatGPT’s output as “robotic.”
Essay ownership also varied. Most brain-only participants reported full ownership, while the LLM group responses ranged widely from full ownership to explicit denial to many taking partial credit.
Despite this, essay satisfaction remained relatively high across all groups, with the search group being unanimously satisfied. Interestingly, LLM users were often satisfied with the output, even when they acknowledged limited involvement in the content’s creation.
Brain power trumps AI aid
While AI tools may improve efficiency, the study cautions against their unnecessary adoption in learning contexts. “The use of LLM had a measurable impact on participants, and while the benefits were initially apparent, as we demonstrated over the course of four months, the LLM group’s participants performed worse than their counterparts in the Brain-only group at all levels: neural, linguistic, scoring,” the authors wrote.
This pattern was especially evident in session four, where brain-to-LLM participants showed stronger memory recall and more directed neural connectivity than those who moved in the opposite direction.
Less effort, lower retention
The study warns that although LLMs reduce cognitive load, they may diminish critical thinking and reduce long-term retention. “The reported ownership of LLM group’s essays in the interviews was low,” the authors noted.
“The LLM undeniably reduced the friction involved in answering participants’ questions compared to the search engine. However, this convenience came at a cognitive cost, diminishing users' inclination to critically evaluate the LLM’s output or “opinions” (probabilistic answers based on the training datasets),” it concluded.