By -

Despite growing awareness of artificial intelligence (AI) hallucinations, legal professionals continue misusing generative AI (GenAI) due to overreliance and subtle inaccuracies in its output. Responsible GenAI use demands more than guidelines: it requires experiential training and a deep understanding of the technology’s limitations.

An empirical study at the University of Wollongong revealed “verification drift”, where initial caution erodes into misplaced trust in GenAI’s polished responses. Verifying AI-generated content is mentally demanding and time-consuming – sometimes outweighing the efficiency gains GenAI promises over traditional legal research. 

Since the launch of ChatGPT in 2022, many media reports have highlighted instances where legal practitioners were caught in the courtroom for using GenAI to draft arguments and identify sources, only to find that parts of the AI-generated content were fabricated. Despite the public scrutiny and warnings about the pitfalls of “AI hallucinations,” a term used to describe GenAI’s risk of producing false or misleading information, similar incidents continue to emerge.

Why does this issue persist? Are legal practitioners still unaware of GenAI’s limitations, or is the allure of its polished, authoritative responses simply too convincing to resist? There are various guidelines on the use of GenAI, like those provided by the Supreme Court of NSW and the Law Society of NSW. However, can such guidance effectively address the challenges?

To explore this question, I conducted an empirical study at the School of Law, University of Wollongong. Rather than simply providing guidelines, I went a step further and trained students on responsible AI use in their assignments. The results, however, revealed just how complex—and precarious—this issue truly is.

Implementing GenAI in a law assignment 

In the elective subject, ‘Law and Emerging Technologies’ at the University of Wollongong, students were tasked with reporting on the ethical and legal issues surrounding autonomous vehicles. They were explicitly encouraged to use AI tools and provided with guidance on how to use them responsibly. Additionally, in a few sessions, students engaged in practical exercises and received feedback on various GenAI use cases, including crafting effective AI prompts, recognising AI’s limitations, and—most importantly—verifying the accuracy of AI-generated content. Students were encouraged to treat GenAI as one tool in their research toolkit and use it alongside more traditional resources such as digital libraries. As part of their submission, students were required to include a reflective note on how they incorporated AI into their research process.

Out of the 72 students enrolled in the subject, 28 (approximately 39 per cent) agreed to have their assignments included in the study. On the positive side, students demonstrated innovative uses of GenAI. Many used it to brainstorm ideas, refine the structure of their arguments, simplify complex concepts, and improve the clarity of their writing. These students treated GenAI as a valuable assistant, working alongside traditional resources to enhance their final submissions. 

However, some concerning patterns also surfaced. Several students failed to follow the guidelines and training they had received on the responsible use of AI. While instances of completely fabricated sources were rare, a more frequent issue involved the inclusion of irrelevant, inaccurate, or fabricated information produced by GenAI. This was especially problematic when GenAI cited legitimate sources but attributed incorrect or misleading content to them. For instance, one student noted that “[autonomous vehicles] could remove up to 9 out of 10 cars in urban areas, reducing the overall environmental impact of transportation”. However, a careful review indicated that the original source provided a significantly different claim: “TaxiBots combined with high-capacity public transport could remove 9 out of every 10 cars in a mid-sized European city”.

In another case, a student summarised content using GenAI and wrote, “[Caterpillar haul trucks] autonomously moved more than 5 billion tons of material in the 2021–22 calendar year”. While the source did confirm the 5-billion-ton figure, it made no mention of the 2021–22 period, nor did it imply that timeframe.

At first glance, the content in both cases may have seemed accurate, but subtle discrepancies significantly changed the intended meaning. These examples show that hallucinations can involve entirely fabricated material or minor inaccuracies that lead to misinterpretation.

The ‘Verification Drift’ phenomenon 

Why did some students, even after receiving training and feedback on responsible GenAI use, fail to adhere to the provided instructions? This question is complex, but one possible explanation is a phenomenon I termed “verification drift.” This occurs when users initially approach AI-generated content cautiously, aware of the risks of inaccuracy and the potential for hallucination. However, as they engage with the content, they gradually become overconfident in its reliability and find verification less necessary. This misplaced trust may stem from GenAI’s authoritative tone and ability to present incorrect details alongside accurate, well-articulated data. This suggests that the challenge is not just a lack of awareness but a cognitive bias that lulls users into a false sense of security.

Verifying the GenAI output is not a matter to approach casually. Users must treat the output with scepticism and verify it independently. Even for simple tasks such as summarising and editing, users must review the generated content carefully to ensure no new content is introduced or key information is removed. Simply attempting to assess whether the content “makes sense” is insufficient and a likely explanation for drifting away from the verification process. GenAI models are designed to be coherent and stylistically consistent with the context, which makes them particularly persuasive—especially as the technology continues to improve.

The Siren’s Song and Odysseus Promise 

An analogy for resisting the allure of AI-generated content is the story of Odysseus and the Siren from The Odyssey. In the myth, the Sirens—creatures who appear as beautiful women—sing songs so enchanting that sailors steer their ships toward the rocks, leading to destruction. To resist them, Odysseus instructed his crew to block their ears with beeswax and had himself tied to the mast so he could hear the song without being able to act on its temptation. Legal professionals, in particular, need a modern-day version of Odysseus’ pact when using GenAI. 

Despite knowing that others have faced consequences for submitting fabricated AI-generated material in court, some lawyers have still followed the same path. Like the sailors lured by the Sirens, they were persuaded by the convincing tone of GenAI output and failed to verify its accuracy. The Odysseus mindset—acknowledging the persuasive power of something and preparing in advance not to be swayed—is essential when engaging with AI-generated content, no matter how plausible it may seem.

Verification burden 

During the review of students’ assignments, I found it particularly challenging to verify whether the content was supported by the cited sources. Determining whether AI-generated claims accurately reflect their references involves meticulous cross-checking against extensive material, a process that is both time-consuming and mentally taxing. For busy legal practitioners, this task can become burdensome, causing some to accept AI-generated content at face value without thorough verification—i.e. verification drift. 

In my own experience, I initially overlooked several seemingly plausible inaccuracies in students’ submissions. These issues only came to light during the course of this study, when I dedicated several hours per assignment to rigorously cross-check each claim against its cited source. Put simply, without the specific intention and time commitment of this study, I likely would not have identified many of the problematic practices associated with generative AI use.

This experience suggests that, for certain tasks, relying on generative AI followed by exhaustive verification may be less time-efficient than conducting traditional legal research. GenAI should, therefore, be understood as a tool with specific capabilities, and legal practitioners should be encouraged to consider alternative resources when appropriate.

Practical steps toward responsible GenAI use 

The persistence of AI misuse in courts and even among trained students suggests that guidelines alone are not enough. If legal professionals are to use GenAI responsibly, we need a more holistic response: one that recognises the cognitive traps like verification drift, addresses the practical burden of checking AI-generated content, and supports the development of meaningful AI literacy. The following four recommendations aim to tackle these issues: 

  1. Evidence from this study and other reported cases of GenAI misuse in courtrooms shows that interacting with GenAI requires a distinct set of skills. Acquiring these skills goes beyond following a list of instructions. While guidelines such as the Supreme Court of NSW and the Law Society of NSW provide valuable directions, they are not sufficient to acquire skills for the responsible use of this technology. Legal professionals need access to training courses that provide a hands-on, interactive experience.
  2. As generative AI becomes an increasingly important tool and expected skill in the legal profession, it is difficult to imagine any lawyer who will not use GenAI at some point in their career, and a lack of understanding—even of its basic limitations—can lead to serious consequences. In this context, mandatory AI literacy training may be warranted.
  3. To mitigate the risk of verification drift and emphasise the importance of a rigorous verification process, legal practitioners should be engaged with case studies in which lawyers previously introduced fabricated AI-generated material in their court submissions. Exploring the thought process of their peers can possibly encourage them to adopt a more sceptical and careful approach to AI-generated content.
  4. It is important to approach claims about AI capabilities with caution. While GenAI promises benefits in the legal domain such as enhanced research capabilities and document drafting, these claims are often overstated. Companies like OpenAI highlight their models’ performance on legal tasks, including outperforming 90 per cent of the US Bar Exam test takers. However, studies—including an Australian empirical study—have challenged these assertions, showing that GenAI tools perform below the average law student. Until comprehensive and up-to-date benchmarks are available for legal tasks, such claims should be treated with scepticism.

Dr Armin Alimardani is a Senior Lecturer in Law and Emerging Technologies at the School of Law, University of Wollongong.