Intentionally or not, essentially everyone exposed to the digital world has interacted or used artificial intelligence tools in some way. Whether through an AI summary of meeting notes auto-generated by a dictation app or the search engine overviews that apply generative AI to condense results, the use of AI seems nearly impossible to avoid. In fact, one Australian study [1] reported that 65% of sampled mental health professionals utilized AI to help with research. The frequent use of artificial intelligence tools for research and writing tasks has once again become an object of attention after a released paper preprint of an MIT study. In it, the authors discuss the implications of cognitive debt accumulation and AI use, underscoring potential concerns about “your brain on ChatGPT” [2]. The public response to these findings have been mixed, from some reporting the uniqueness of the methodology to others implying that the findings have been sensationalized to create fear or drive attention. This article aims to highlight key findings from the study, limitations of it, and the broader discussion around AI’s cognitive implications. It is also worth highlighting potential recommendations for more sustainable ways to engage AI tools while mitigating cognitive repercussions.
The MIT Media Lab Study
Authored by a team from the Massachusetts Institute of Technology, Wellesley College, and the Massachusetts College of Art and Design, this study aims to explore cognitive costs of using AI in academic contexts, in this design: writing an essay response to an SAT-sourced prompt. Participants in three groups: the large language model (LLM) group, the search engine group, and the brain group (used no external tools) had EEG data taken during the task, a natural language processing (NLP) analysis of the essays, and post-task interviews. For the purposes of this article, the EEG data provides the most pertinent results for cognitive differences associated with AI use. The paper discusses how the cognitive strategies engaged by each of the groups were significantly different, representing divergence in neural connectivity patterns.
Some key takeaways [2] were as follows:
- The brain only group produced more diverse essays, with the strongest variability in how participants went about writing for a given essay topic.
- The LLM group wrote statistically homogenous essays.
- The search engine and LLM groups were found to focus more than brain-only on integrating the generated content as opposed to synthesizing it with their own writing or perspectives.
- This could also be due to the 20 minute time limit placed on each writing task.
- In the scoring process, human teachers were able to identify the style elements of specific writers that were consistent across different essays from the same writer they had scored.
- After fine tuning and NLP analysis, the AI judge was not able to attribute styles to authors in the same way, indicating a level of nuance in scoring that only the human judges could ascertain.
- Based on a dynamic directed transfer function (dDTF) analysis of EEG data, activations and connectivity across alpha, theta, and delta bands (different frequencies of neural activity output recorded by EEG) were the highest of any of the groups. There was especially strong activation of temporo-parietal (associated with self-understanding, social cognition, and attention) and frontal (responsible for executive function) regions.
- The LLM group not only showed the least extensive connectivity between regions, the magnitude of dDTF was reduced up to 55%.
- The search engine group showed engagement with occipital and visual brain regions, implying that actively scanning, filtering, and assessing pieces of information to integrate into essays required these additional cognitive resources.
- Behaviorally, LLM group participants were less able to quote their own essays. No participants provided correct quotes in the first session of the study.
- This is strengthened by the low dDTF strength in the frontal and temporal nodes responsible for deep memory encoding, which was not an impairment observed in the search engine and brain-only participants.
- The semantic precision operationalized as the ability to correctly quote from one’s own essay followed the same pattern (Brain-only group > Search Engine group > LLM group)
- Interestingly the sense of essay ownership reflected in the post-task interviews was also lowest in the LLM groups, reflecting low agency and authorship in their written output.
- This is strengthened by the reduced convergence in the anterior frontal regions responsible for error monitoring and self-evaluation, critical elements of metacognition. This can be interpreted as participants psychologically dissociating from the written output of LLM tools.
- The direction of information flow measured by the EEG in brain-only participants was characterized by bottom-up flows (temporal/parietal regions to frontal), compared to the LLM group which more likely encountered top-down information flow (frontal to posterior).
- This can be interpreted as the brain only group displaying brain activity consistent with generating novel content internally, integrating it, and choosing how to express it. This can be held in contrast with the LLM group who filtered the contributions of an external source and mapped this onto an overall narrative, essentially keeping the brain in a preparation phase.
Limitation, Criticisms, and Discourse
While criticism is an essential aspect of scientific discourse, many of the limitations of this study have already been addressed by the researchers themselves. Firstly, some have cautioned against reading too much into the results, despite the attention the study has received because it is not yet peer-reviewed [3]. Because the authors believed that the topic was current and urgent, the results were made public in preprint form, especially given how long the peer review process can take for many journals. This in itself is no reason to discredit the findings as inherently weak but is something to consider.
As addressed by the researchers in the preprint, the lack of geographic diversity in recruited participants, the use of ChatGPT as the sole LLM model, and the lack of subdivision of the writing task into smaller stages for more precision observation were all limitations in how much this study can be generalized. This is a highly context dependent set of findings and speaks primarily to essay writing in an academic situation, which may not capture the tasks that an average person uses AI for. Clearly, this invites further studies to leverage the momentum generated by this research to develop more longitudinal designs across more diverse tasks to fill gaps in current literature around AI and cognitive effects. The research team behind the MIT study made it a point to underscore the importance of language: none of the findings in this study can conclusively draw interpretations like “LLMs make you stop thinking” or cause “brain damage” which points to a lack of scientific literacy [4]. Ironically, many media sources leveraged LLMs to summarize the paper’s findings, which can further misconstrue the actual implications of their results. The authors offer a crucial warning: “As reliance on AI tools increases, careful attention must be paid to how such systems affect neurocognitive development, especially the potential trade-offs between external support and internal synthesis.”
Recommendations
A study like this, along with separate peer-reviewed publications reporting how excess AI use can negatively impact critical thinking [5], begs the question: how, if at all, can people leverage AI tools in ways that reduce the potential for cognitive offloading? Some educational professionals recommend being intentional about when and how AI is used. For example, avoiding the use of LLMs as default search engines, instead parsing search results for yourself. This allows us to start off with something human-driven, and utilize AI tools to refine it [6]. In practice this might look like writing an essay from scratch and engaging an AI tool as an editor to check for clarity or grammatical correctness. In an academic context, this could look like asking an AI tool to define a specific complicated concept, rather than summarizing a whole textbook chapter that could be read for its full nuance. Essentially, uses of AI that preserve an author or student’s voice and use AI for sparing modifications or editorial input are likely “safer” for long-term cognitive health. More research is needed to produce more detailed, evidence-based recommendations for sustainable AI use.