Until recently, empirical legal research has largely concluded that generative artificial intelligence tools may be interesting, but they don’t offer much actual value in real-world lawyering.
However, a new study by Michigan Law Professors J.J. Prescott, Patrick Barry, and their colleagues suggests that certain AI tools can help with efficiency and, in some cases, accuracy and legal reasoning.
“This project suggests that not all AI models are always going to be helpful on all dimensions, but by and large you’re not any worse off using AI as a tool. And, on many dimensions, you are much better off,” Prescott says.
The researchers focused on two newer types of AI tools:
- A reasoning model, OpenAI’s o1-preview. An improvement over earlier “chatbots,” such models are designed to plan their responses before providing them.
- A retrieval-augmented generation (RAG) tool, vLex’s Vincent. RAG tools integrate generative AI technology with legal source materials.
For the study, researchers asked 127 law students from the University of Michigan and the University of Minnesota to complete six tasks that were developed in consultation with working lawyers, such as writing an email to a client, drafting a contract or legal memo, and analyzing a complaint. Each student completed two tasks without the use of AI, two with the aid of o1-preview, and two with the support of Vincent. The tools and control conditions were randomized across the tasks and participants.
Professor J.J. PrescottBeing able to establish that right now, with these earlier models, AI can already provide an advantage—that should be enough to tell us that this is really going to change the way we practice law.
The researchers graded the resulting work for accuracy, analysis, organization, clarity, and professionalism. For four of the six tasks, students produced better work with the support of AI tools than without them. (See the results sidebar.)
The study “provides the first empirical evidence, to our knowledge, that AI tools can consistently and significantly enhance the quality of human lawyers’ work across various realistic legal assignments,” the researchers write in their paper.
Prescott says, “The technology is going to keep getting better and easier to use, and it’s going to look more and more professional. So being able to establish that right now, with these earlier models, AI can already provide an advantage—that should be enough to tell us that this is really going to change the way we practice law.”
The results have clearly generated interest: The paper has been downloaded more than 10,000 times from the website of SSRN, an open-access research platform for academics, making it one of the site’s 50 top papers in all disciplines over the past year.
Prescott hopes to see law firms and companies take up the task of further evaluation of AI’s real-world effectiveness—especially as the number of AI products continues to grow. It’s now clear, though, that a shift in the practice of law has arrived.
“AI is going to change the kinds of tasks lawyers do,” Prescott says. “Ultimately, they’ll still be facilitating transactions and resolving disputes and giving counseling. But what constitutes a valuable way for them to spend their time will certainly change.”
Research Results
Researchers asked law students to complete six realistic legal tasks with and without the aid of newer AI technology—a reasoning model (o1-preview from OpenAI) and retrieval-augmented generation (Vincent from vLex). Among their findings:
- Both o1 and Vincent yielded “substantial, statistically significant improvements” in the speed of completing the tasks.
- For four of the six tasks, the quality of the work product using AI tools was “considerably better” than that of students not using AI.
- Quality improvements were concentrated in litigation-oriented tasks; they did not extend to the one transactional task that was tested (drafting a contract).
- Both Vincent and o1 demonstrated quality improvements in “clarity, organization, and professionalism.”
- Vincent’s effect on accuracy was mixed, but it produced fewer hallucinations than o1—and about the same as human researchers.
- For three tasks, o1 produced “statistically significant and substantial improvements in the quality of the legal analysis.”