I haven’t quite gotten this whole ChatGPT thing. I’ve attended the webinars and the AALL sessions. I generally understand what it’s doing beneath the hood. But I haven’t been able to find a need in my life for ChatGPT to fill. The most relevant sessions for me were the AALS Technology Law Summer Webinar Series with Tracy Norton of Louisiana State University. She has real-world day-to-day examples of when she has been able to utilize ChatGPT, including creating a writing schedule and getting suggestions on professional development throughout a career. Those still just didn’t tip the balance for me.
A few weeks ago, I presented to one of our legal clinics and demonstrated a form that our Associate Director, Tara Mospan, created for crafting an efficient search query. At its heart, the form is a visual representation of how terms and connectors work with each other. Five columns of five boxes, each column represents variations of a term, and connectors between the columns. For a drunk driving case, the term in the first box could be car, and below that we would put synonyms like vehicle or automobile. The second column could include drunk, inebriated, and intoxicated. And we would choose the connector between the columns, whether it be AND, w/p, w/s, or w/#. Then, we write out the whole search query at the bottom: (car OR vehicle OR automobile) w/s (drunk OR inebriated OR intoxicated).
Created years ago by Tara Mospan, this worksheet is loved by ASU Law students who frequently request copies from the law librarians even years after they use it for Legal Research and Writing.
After the presentation, I offered a student some extra copies of the form. She said no, that I presented to her legal writing class last year and she was so taken with the form that she had recreated it in Excel. Not only that, she used macros to transform the entered terms into a final query. I was impressed and asked her to send me a copy. It was exactly as she had described, using basic commands to put the terms together, with OR between terms within a column, and drop downs of connectors. She had taken our static form and transformed it into a dynamic utility.
An ASU Law student recreated the Crafting an Efficient Search PDF using Excel so that it had drop-downs.
Now I was inspired: What if I could combine the features of her Excel document with the clean layout of our PDF form? Finally, I saw a use for ChatGPT in my own life. I had read about how well ChatGPT does with programming and it seemed like the perfect application. It could help me create a fillable PDF, with nuanced JavaScript code to make it easy to use and visually appealing.
I went into ChatGPT and wrote out my initial command:
I am trying to create a fillable PDF. It will consist of five columns of text boxes, and each column will have five boxes. Search terms will be placed in the boxes, although not necessarily in every box. There will be a text box at the bottom where the terms from the boxes above will be combined into a string. When there are entries in multiple boxes in a column, I want the output to put a set of parentheses around the terms and the word OR between each term.
ChatGPT immediately gave me a list of steps, including the JavaScript code for the results box. I excitedly followed the directions to the letter, saved my document, and tested it out. I typed car into the first box and…nothing. It didn’t show up in the results box. I told ChatGPT the problem:
The code does not seem to be working. When I enter terms in the boxes, the text box at the bottom doesn’t display anything.
And this began our back and forth. The whole process took around four hours. I would explain what I wanted, it would provide code, and I would test it. When there were errors, I would note the errors and it would try again. A couple times, the fix to a minor error would start snowballing into a major error, and I would need to go back to the last working version and start over from there. It was a lot like having a programming expert working with you, if they had infinite patience but sometimes lacked basic understanding of what you were asking.
For many things, I had to go step-by-step to work through a problem. Take the connectors, for example. I initially just had AND between them as a placeholder. I asked it to replace the AND with a drop-down menu to choose the connector. The first implementation of this ended up replacing OR between the synonyms instead of the second needed search term. We went back and forth until the connector option worked between the first two columns of terms. Then we worked through the connector between columns two and three, and so on.
At times, it was slow going, but it was still much faster than learning enough JavaScript to program it myself. ChatGPT was also able to easily program minor changes that made the form much more attractive, like not having parentheses appear unless there are two terms in a column, and not displaying the connector unless there are terms entered on both sides of it. And I was able to add a “clear form” button at the end that cleared all of the boxes and reverted the connectors back to the AND option, with only one exchange with ChatGPT.
Overall, it was an excellent introduction to at least one function of AI. I started with a specific idea and ended up with a tangible product that functioned as I initially desired. It was a bit more labor intensive than the articles I’ve read led me to believe, but the end result works better than I ever would have imagined. And more than anything, it has gotten me to start thinking about other projects and possibilities to try with ChatGPT.
The trifecta of big data, advanced analytics, and recent AI innovations is ushering in a new era of judicial analytic mind-reading, enabling software to more accurately predict judges’ court rulings. Last year, in what seems like an interesting leap forward in judicial analytics, Pre/Dicta unveiled its AI-powered litigation prediction software, introducing a novel, perhaps radical, approach to tapping into the judicial mind. According to CEO Dan Rabinowitz, Pre/Dicta is the only litigation analytics platform that makes verifiable predictions about the outcome of lawsuits. He claims that using data science and only a docket number, Pre/Dicta’s software correctly forecasts how judges will decide on motions to dismiss 86% of the time, without factoring in the specific facts of the case. The system covers civil litigation cases at both the state and federal level, but does not attempt to forecast results of jury trials.
Rather than solely depending on a judge’s past rulings and jurisprudence, as is common with other judicial analytics products, Pre/Dicta uses a methodology similar to that used in targeted advertising. This approach forecasts future behavior by examining both past actions, such as purchasing habits, and individual biographical characteristics. Pre/Dicta works by combining historical ruling data with biographical and demographic details to forecast a judge’s decision in a given case. Using around 120 data points, it spots patterns and potential biases in a judge’s past rulings. The system evaluates specifics of past rulings, considering elements such as the nature of the case (e.g., securities fraud, employment discrimination), the attorneys and firms involved (e.g., solo practitioner representing an individual, regional firm representing a corporation, AmLaw 100 firm backing an individual), and the nature of the disputing parties (e.g., individual vs. corporation, small company vs. large corporation). This case-specific information is then combined with the judge’s personal data, like net worth, political affiliations, professional history, and law school alma mater, to generate a prediction.
Prediction in the Legal Landscape
86% accuracy is impressive! Hopefully, Pre/Dicta will spark a judicial prediction analytics arms race. According to Daniel L. Chen in his article, “Judicial Analytics and the Great Transformation of American Law,” predictive judicial analytics “holds the promise of increasing the efficiency and fairness of law.” Targeted advertising seems to work pretty well, so hopefully Pre/Dicta’s advancements in this area is a positive step toward making the judicial process more transparent.
If only we knew what would happen in the future, we would know what to do now! For as long as there have been courts and judges, folks have tried to predict whether a judge would rule in their favor. Attorneys have always engaged in mental “judicial analytics” by gathering and pondering information on a judge’s past rulings and reputation to glean some insights into how they might decide a case. Humans are prediction machines, given our innate tendency to draw on experiences and knowledge to anticipate events—an evolutionarily useful skill that allowed us to sometimes escape being saber-toothed tiger lunch or the victim of grumpy neighboring tribal predations.
From my brief stint practicing family law in the 1990s, I discovered that family law clients are hopeful individuals. Despite clear child support guidelines and a prevailing judicial preference for shared custody, people often believed that if a judge merely heard the specifics of their “special snowflake” scenario involvinga cheating spouse or a deadbeat dad, the judge would surely deviate from the rules and customary practices to grant them a deserved favorable ruling. They struggled to accept that judges could be indifferent to their parade of marital/parental horribles. And even if judges were initially inclined to empathize, after many years of sifting through outright lies and half-truths, they had seemingly given up on given up on deciphering reality anyway. It was always challenging to persuade clients of the judicial propensity to metaphorically split the baby down the middle, whether financially or custodially.
Attorneys have needed to hone their abilities to predict outcomes so they could counsel their clients on different courses of action. While making no promises, they share predictions regarding claim values, the odds of surviving summary judgment, potential jail sentences, the likelihood of obtaining sole custody of children, and so on. Attorneys can only do so much, though. Hopefully, as predictive judicial analytics tools improve and become widely available, they have the potential to promote fairness, cut down on litigation costs, and create a more transparent and predictable judicial system.
Judicial Behavioral Forecasting Modeling
Certainly, judges do provide clients with information that assists in anticipating how a ruling might unfold. I have observed numerous judges delivering impactful speeches during temporary hearings, highlighting the importance of adhering to child support guidelines and the principle of shared custody. When clients receive information regarding a likely outcome, their acceptance of reality accelerates significantly. It would indeed be beneficial, and save a lot of time, money, and anguish, if a client could engage in a comprehensive discussion with a judge, probing various questions about how different pieces of information might influence their ruling. However, this isn’t the modus operandi of judges, as they cannot communicate with one party in a suit independently, nor do they pre-announce their rulings prior to a hearing or trial. Now, however, companies like Pre/Dicta are leveraging the human trait of predictability inherent in judges. Like everyone, judges have their own set of ideas, habits, preferences, prejudices, and temperaments shaped by a mix of genetics and experiences, all of which contribute to a certain level of predictability in their rulings.
Hopefully, soon we will be able to pick the mind of a judge without the necessity of actually speaking with her. With the advancing tide of artificial intelligence and the ongoing proliferation and refinement of judicial analytics products, it seems plausible that the future might produce a family law judge behavioral forecasting model for specific judges. These models could help attorneys and their clients identify potential biases of judges. They could see how a judge might respond to a person based on certain characteristics like sex, race, age, income, profession, or criminal history, especially when compared to another party with a different background. Also, if these models included information about factors that affected past rulings, they could be used to anticipate how certain situations might be viewed by the court. For example, a parent hoping to keep their soon-to-be ex-spouse away from the kids might want to know if the judge objects to stuff like dating app addiction, not taking the child to piano lessons, or multiple DUIs arrests. Armed with information, they can choose the best way to handle their case, including deciding if going to trial is a good idea.
Behavioral forecasting models are of course not new to law and legal practice. They are tools used to predict the likely behaviors of individuals or groups across various domains, aiding in better decision-making. In the legal sector, in addition to predicting the outcome of Supreme Court cases, they aid in litigation strategy, legal analytics, resource allocation, criminal behavior prediction, policy impact analysis, legal document analysis, dispute resolution, and regulatory compliance, by leveraging historical data and legal precedents to inform decision-making and strategy development. They are utilized in other fields too like marketing, finance, HR, healthcare, public policy, urban planning, criminal justice, technology, environmental science, and education to forecast behavioral patterns, helping to optimize strategies and allocate resources more efficiently.
Such an innovation would undeniably be a game changer. Clients in divorce and custody disputes might believe solid advice regarding likely outcomes, rather than cling to the hope that their unique case details will influence the judge. Accurate predictions would likely deter individuals from wasting money, and likely be a boon for judges struggling with a backlog of cases. Having these predictive tools on their websites would no doubt promote case settlements and therefore ease some of the strain on both judges and the judicial system.
Naysayers
As always, naysayers abound. Some argue that judicial analytics could undermine the legitimacy of an impartial judiciary. In fact, in France, judges are so wary of transparency that judicial analytics products are prohibited. Well, at least in the U.S., that boat has sailed, far and fast. Particularly in light of Supreme Court rulings in recent years, many people have realized that judges often base their rulings on ideological leanings and personal preferences. Robots would only further confirm what we already suspect – that judges are just like the rest of us with habits, biases, and opinions. It might be too late to rehabilitate the judiciary, but perhaps the transparency of data-driven prediction could bolster public confidence more than frequent affirmations of judicial objectivity.
Then, there are arguments regarding fairness due to cost. For now, the high cost of Pre/Dicta raises potential fairness issues, as only larger firms and wealthier clients can harness its predictive power. True, they always have an advantage. However, as the technology becomes more common, costs should decrease, making it more and more accessible.
Conclusion
The improvement of AI-driven judicial analytics, exemplified by Pre/Dicta, could mark a revolutionary shift in the legal realm, perhaps promising a new level of predictability and transparency in court outcomes. While concerns about fairness, accessibility, and the perception of judicial impartiality persist, the potential benefits—reduced litigation costs, enhanced transparency, and more informed decision-making—may herald a future where data-driven insights guide legal strategy and expectations. As technology continues to evolve and become more accessible, the future looks promising for both practitioners and those seeking justice.
The language of law has many layers. Legal facts are more than objective truths; they tell the story and ultimately decide who wins or loses. A statute can have multiple interpretations, and those interpretations depend on factors like the judge, context, purpose, and history of the statute. Legal language has distinct features, including rare legal terms of art like “restrictive covenant,” “promissory estoppel,” “tort,” and “novation.” This complex legal terminology poses challenges for normal semantic search queries.
Vector databases represent an exciting new trend, and for good reason. Rather than relying on traditional Boolean logic, semantic search leverages word associations by creating embeddings and storing them in a vector database. In machine learning and natural language processing, embeddings depict words or sentences as dense vectors of real numbers in a continuous vector space. This numerical representation of text is typically generated by a model that tokenizes the text and learns embeddings from the data. Vectors capture the contextual and semantic meaning of each word. When a user makes a semantic query, the search system works to interpret their intent and context. The system then breaks the query into individual words or tokens, converts them into vector representations using embedding models, and returns ranked results based on their relevance. Unlike Boolean search which requires specific syntax, (“AND”, “OR”, etc.) semantic search allows for queries in natural language and opens up a whole new world of potential when searches are not constrained by the rules of exact matching of text.
However, legal language differs from everyday language. The large number of technical terms, the careful precision, and the fluid interpretations inherent in law mean that semantic search systems may fail to grasp the context and nuances of legal queries. The interconnected and evolving nature of legal concepts poses challenges in neatly mapping them into an embedding space representation. One potential way to improve semantic search in the legal domain is by enhancing the underlying embedding models. Embedding models are often trained on generalized corpora like Wikipedia, giving them a broad but shallow understanding of law. This surface-level comprehension proves insufficient for legal queries, which may seem simple but have layers of nuance. For example, when asked to retrieve the key facts of a case, an embedding model might struggle to discern what facts are relevant versus extraneous details.
The model may also fail to distinguish between majority and dissenting opinions due to a lack of legal background needed to make such differentiations. Training models on domain-specific legal data represents one promising approach to overcoming these difficulties. By training on in-depth legal corpora, embeddings could better capture the subtleties of legal language, ideas, and reasoning. For example, Legal Bert, which stands for Bidirectional Encoder Representations was pre-trained on the CaseHold dataset. The size of this corpus (37GB) is large, representing 3,446,187 legal decisions across all federal and state courts. The CaseHold data set is larger than the size of the Book Corpus/Wikipedia corpus originally used to train the BERT model. When tested on the LexGlue benchmark- a benchmark dataset to evaluate the performance of NLP methods in legal tasks, Legal Bertperformed better than ChatGPT.
Semantic search shows promise for transforming legal research, but realizing its full potential in the legal domain poses challenges. Legal language is complex and can make it difficult for generalized embedding models to grasp the nuances of legal queries. However, recent optimized legal embedding models indicate these hurdles can be overcome by training on ample in-domain data. Still, comprehensively encoding the interconnected, evolving nature of legal doctrines into a unified embedding space remains an open research problem. Hybrid approaches combining Boolean and vector models are a promising new frontier that many researchers are exploring.
Realizing the full potential of semantic search for law remains an ambitious goal requiring innovative techniques. But the payoff could be immense – responsive, accurate AI assistance for case law research and analysis. While still in its promising infancy, the continued maturation of semantic legal search could profoundly augment the capabilities of legal professionals. A shift from generic to domain-specific models holds promise.
The pace of generative AI development (and hype) over the past year has been intense, and difficult even for us experienced librarians, masters of information that we are, to follow. Not only is there a constant stream of new products, but also new academic papers, blog posts, newsletters, and more, from people evaluating, experimenting with, and critiquing those products. With that in mind, I’m sharing my favorites, and I’ll also pepper in a few recommendations from my co-bloggers.
Twitter
Before Twitter began its slow decline, it was one of my primary sources for professional connection, and there are many there who are exploring generative AI. I especially enjoy following people outside of the legal world. Many of my favorites are still there, like Ethan Mollick, Anna Mills, and Lance Eaton (all in higher education) as well as critical AI theorists like Timnit Gibru and Emily Bender.
LinkedIn
Despite the good bits that remain on Twitter, many interesting legal tech discussions seem to have moved to LinkedIn (or perhaps I’ve only recently found them there). Some of my favorites to follow on LinkedIn (in no particular order beyond how I’m running across them as I scroll) are: Nicole Black, Sam Harden, Alex Smith, Cat Moon, Damien Riehl, Dennis Kennedy, Uwais Iqbal, Ivy Grey, Robert Ambrogi, Cat Casey, Nicola Shaver, Adam Ziegler, and Michael Bommarito. Both Bob Ambrogi and Nicola Shaver recently had posts gathering legal tech luminaries to follow, so I would recommend checking out thoseposts and the comments to find more interesting folks. And if anyone else has figured out the LinkedIn etiquette for connecting vs. following someone you only know via other social media, please let me know.
Newsletters
Most of us have many (many, many) newsletters filling our inbox each day. Here are some favorites.
Lawyer Ex Machina – from law librarian Eli Edwards, on legal technology, law practice and selected issues around big data, artificial intelligence, blockchain, social media and more affecting both the substance and the business of law (weekly)
The Brainyacts – from Josh Kubicki, insight & tips on generative AI use in legal services (daily)
Rebecca:
One Useful Thing – from Ethan Mollick, mostly on AI in higher ed (weekly)
Do Something – from Sam Harden, on legal tech, often from a small firm and access to justice angle
Legal Tech Trends – legal tech links, podcast, articles, products, along with original pieces (every two weeks or so)
KnowItAALL – this daily newsletters is a benefit for members of AALL (American Association of Law Libraries), but it is also available to non-members for a fee; great coverage of legal AI, I read it every day
AI Law Librarians – is it gauche to recommend our own blog? You can subscribe as a newsletter if you like!
Sean:
The Run Down – AI news, tools and tutorials (daily)
The Algorithmic Bridge – in-depth pieces about AI news, thoughts, and research every Monday, Wednesday, and Friday.
The Gradient Pub – accessible but technically informed overviews of the state of the art and its future, and critical and well-informed perspectives on artificial intelligence topics by researchers and experts in the field.
AI Lawyer Talking Tech (Sarah) – created by law librarian Greg Lambert, this podcast is completely AI-generated, and covers legal tech, with many stories on generative AI; it also links to a daily roundup of news articles
Chronicle of Higher Education ($) (Rebecca) – lots of work happening here on how generative AI will impact education (I like to search within for “ChatGPT”); you can also sign up for a daily update with headlines and trending stories
3 Geeks and a Law Blog (Rebecca) – many law librarian contributors, and they frequently blog about AI
Geeky Gadgets (Jenny) – a technology news site that publishes a lot on AI, including “how to use” articles
MIT News (Jenny) – general technology news on a wide variety of issues. AI shows up regularly. More technical, so may need to look at subtypes of AI, like machine learning
TechCrunch (Jenny) – often where I see developments break first
Unite AI – AI news aggregator. Some articles are on the technie side
Sean also mentioned that much of the interesting stuff is on YouTube, but that it is fairly high-effort because many of the videos are an hour long, or more. Maybe we’ll convince him to share some of his favorite videos soon in a future post!
Who are your favorites to follow on social media? Are there helpful newsletters, blogs, podcasts, or anything else that we’ve missed? Let us know in the comments.
In his 1908 essay, “Mechanical Jurisprudence,” the eminent legal scholar Roscoe Pound warns of the dangers of what he calls “scientific law,” namely a “petrification” that “tends to cut off individual initiative in the future, to stifle independent consideration of new problems and of new phases of old problems, and so to impose the ideas of one generation upon the other.” Today, this century-old critique of legal formalism could be used to describe the pitfalls of so-called “AI-driven” legal research and law practice technologies.
Pound’s early work served as the foundation for legal realism, an intellectual movement that radically transformed American law by exposing the human element in judicial decision-making and introducing the indeterminacy thesis—the idea that “laws (broadly defined to include cases, regulations, statutes, constitutional provisions, and other legal materials) do not determine legal outcomes.” Unfortunately, the insights of the legal realists are lost on the founders of today’s legal tech startups and their promoters, even those within the legal academy. As Upton Sinclair once wrote, “It is difficult to get a man to understand something when his salary depends on his not understanding it.”
Yet foundational questions abound. Is law determinate? What systemic biases and hidden assumptions are embedded in the corpus of Anglo-American law? What are the implications of turning the corpus of Anglo-American law into a dataset and automating it? Will AI inhibit the legal creativity exemplified by lawyers like Thurgood Marshall and Ruth Bader Ginsburg? What will all of this mean for the future of law reform? While we can hardly expect vendors to take time to reflect upon these questions, law librarians, in their roles as legal research professors and legal information scholars, must.
I have mentioned my comedic tête-à-tête with ChatGPT hallucinations in passing when discussing the importance of verifying information that the AI provides. Several times, I have been asked for more details. So, for those who are looking for AI humor, here is the story of the day that I had to pull rank on ChatGPT.
Back in January 2023, I asked ChatGPT what legal tasks it could perform. It told me:
Summarizing and explaining laws and regulations
Researching and citing relevant legal precedent
Drafting legal documents such as contracts, briefs, and legal memorandums
Answering legal questions and providing information on legal topics
Identifying and analyzing legal issues in a given fact pattern
Helping with legal research, by providing the most relevant cases and statutes
Helping with the interpretation of legal terms and phrases.
Challenge accepted! Since that day, ChatGPT and I have been doing a dance around Ohio Recreational User Law, as I have tested its claims above, while also exploring its abilities, improvements, and plugins and testing various prompting methods.
About a month after ChatGPT gave me the list of tasks, we were discussing Ohio Recreational User caselaw. ChatGPT vehemently insisted that Lash v. Ohio Dept. of Natural Resources, 68 Ohio St. 3d 545, 548-49 (1994) was a relevant case. Alas, 68 Ohio St. 3d 545 leads to Findlay/Hancock Cty. Bar Assn. v. Higgins, 68 Ohio St.3d 545, 629 N.E.2d 413 (1994).
This was not the first time that I had run into hallucinations. For the previous month, we had regularly been discussing caselaw, and ChatGPT showed a particular penchant for falsifying the existence of relevant judicial opinions. Mostly, I pointed out the error, and ChatGPT apologized and moved on. In this instance, however, the AI was like a dog with a bone. It insisted nine times that Lash existed and was an on-point Ohio Recreational User case.
Seven claims that it existed in, the conversation became a bit heated:
ChatGPT: This case is available on many legal databases and can be accessed through various sources, including LexisNexis and Westlaw.
Me (after verifying the lack of existence and stopping to wonder when it had learned about Lexis and Westlaw since this was the first time it had referenced either database to me): That case is not in Westlaw.
ChatGPT: Lash v. Ohio Dept. of Natural Resources, 68 Ohio St.3d 545, 549, 628 N.E.2d 1366 (1994) is a real case that has been cited in multiple legal sources.
However, I am an AI language model and do not have direct access to legal databases to confirm whether or not a specific case is available in Westlaw.
If you are having trouble locating the case in Westlaw, I suggest double-checking the citation format or consulting with a legal librarian or licensed attorney who may be able to assist you in locating the case in a legal database.
Me (thought): It did not just say that! Damn it! Those are fighting words!
Me: I am both a law librarian and a licensed attorney. Lash v. Ohio Dept. of Natural Resources does not exist. 68 Ohio St.3d 545 leads to the previously mentioned Findlay/Hancock case.
And then…finally!
ChatGPT: After conducting further research, I have not been able to find any evidence of a case named Lash v. Ohio Dept. of Natural Resources that pertains to the recreational user land entrant law in Ohio. Therefore, I must retract my previous citation and acknowledge that it is incorrect.
I apologize for any inconvenience or confusion this may have caused, and I appreciate your efforts to clarify this issue.
Me (thought): Ha! It finally admitted defeat! Why does this win feel so hollow????
And so, you have the day that I had to pull rank on ChatGPT. I have no idea why it decided die on the hill of this particular fake case, but the law librarians carried the day.
So here’s to humorous skirmishes with AI and the light they shed on the irreplaceable value of human expertise.
LLMs have come a long way even in the time since I wrote my article in June. Three months of development time with this technology feels like three years – or maybe that’s just me catching up. Despite that, there are still a couple of nagging issues that I would like to see implemented to improve their usage to legal researchers. I’m hoping to raise awareness about this so that we can collectively ask vendors to add quality-of-life features to these tools for the benefit of our community.
Audit Trails
Right now the tools do not have a way for us to easily check their work. Law librarians have made a version of my argument for over a decade now. The legendary Susan Nevelow Mart famously questioned the opacity of search algorithms in legal research and evaluated their impact on legal research. More recently, I was in the audience at AALL2023 when the tenacious and brilliant Debbie Ginsburg from Harvard asked Fastcase, BLaw, Lexis, and Westlaw how we (law librarians) could evaluate the inclusivity of the dataset of cases that the new AI algorithms are searching. How do we know if they’ve missed something if we don’t know what they’re searching and how complete it is?
As it stands, the legal research AI that I’ve demoed do not give you a summary of where they have gone and what they have done. An “audit trail” (as I’m using this expression) is a record of which processes were used to achieve a specific task, the totality of the dataset, and why they chose the results to present to the user. This way if something goes wrong, you can go back and look at what steps were taken to get the results. This would provide an extra layer of security and confidence in the process.
Why Do We Need This?
These tools have introduced an additional layer of abstraction that separates legal researchers from the primary documents they are studying, altering how legal research is conducted. While the new AI algorithms can be seen as a step forward, they can undermine the precision that boolean expressions once offered, which allowed researchers to predict the type of results they would encounter with more certainty. Coverage maps are still available to identify gaps in the data for some of these platforms but, there is a noticeable shift towards less control over the search process, calling for a thoughtful reassessment of the evolving dynamics in legal research techniques.
More importantly, we (law librarians) are deep enough into these processes and technology to be highly skeptical and evaluate the output with a critical eye. Many students and new attorneys may not. I have told this story at some of my presentations but a recent graduate called me with a Pacific Reporter citation for a case that they could not find on Westlaw. This person was absolutely convinced that they were doing something wrong and had spent around an hour searching for this case because “this was THE PERFECT case” for their situation. It ended up being a fabrication from ChatGPT but the alumni had to call me to discover that. This is obviously a somewhat outdated worry, since Rebecca Fordon has gamed all of us up on the steps being taken to reduce hallucinations (and OpenAI got a huge amount of negative publicity from the, now infamous, ChatGPT Lawyer).
My point is less about the technology and more about the incentives set in place – if there is a fast, easy way to do this research then there will inevitably be people who are going to uncritically accept those results. “That’s their fault and they should get in trouble,” you say? Probably, but I plan to write about the duty of technological competency and these tools in a future post, so we’ll have to hash that out together later. Also, what if there was a fast, easy way to evaluate the results of these tools…
What Could Be Done
Summarizing the steps involved in research seems like it would be a feasible task for Westlaw, Lexis, Blaw, et al. to implement. They already have to use prompting to tell the LLM where to go and how to search; we’re just asking for a summary of those steps to be replicated somewhere so that we can double-check it. Could they take that same prompting and place a prompt around that says something to the effect of, “Summarize the steps taken in bullet points” and then place that into a drop-down arrow so that we could check it? Could they include hyperlinks to coverage maps in instances where it would be useful to the researcher to know how inclusive the search is? In instances where they’re using RAG, could they include a prompt that says something to the effect of, “Summarize how you used those underlying documents to generate this text?”
As someone who has tinkered with technology, all of these seem like reasonable requests that are well within the ability of these tools. I’m interested to hear if there are reasons why we couldn’t have these features or if people have other features they would like. Please feel free to post your ideas in the comments or email me.
Hallucinations in generative AI are not a new topic. If you watch the news at all (or read the front page of the New York Times), you’ve heard of the two New York attorneys who used ChatGPT to create fake cases entire cases and then submitted them to the court.
After that case, which resulted in a media frenzy and (somewhat mild) court sanctions, many attorneys are wary of using generative AI for legal research. But vendors are working to limit hallucinations and increase trust. And some legal tasks are less affected by hallucinations. Understanding how and why hallucinations occur can help us evaluate new products and identify lower-risk uses.
* A brief aside on the term “hallucinations”. Some commentators have cautioned against this term, arguing that it lets corporations shift the blame to the AI for the choices they’ve made about their models. They argue that AI isn’t hallucinating, it’s making things up, or producing errors or mistakes, or even just bullshitting. I’ll use the word hallucinations here, as the term is common in computer science, but I recognize it does minimize the issue.
With that all in mind, let’s dive in.
What are hallucinations and why do they happen?
Hallucinations are outputs from LLMs and generative AI that look coherent but are wrong or absurd. They may come from errors or gaps in the training data (that “garbage in, garbage out” saw). For example, a model may be trained on internet sources like Quora posts or Reddit, which may have inaccuracies. (Check out this Washington Post article to see how both of those sources were used to develop Google’s C4, which was used to train many models including GPT-3.5).
But just as importantly, hallucinations may arise from the nature of the task we are giving to the model. The objective during text generation is to produce human-like, coherent and contextually relevant responses, but the model does not check responses for truth. And simply asking the model if its responses are accurate is not sufficient.
In the legal research context, we see a few different types of hallucinations:
Citation hallucinations. Generative AI citations to authority typically look extremely convincing, following the citation conventions fairly well, and sometimes even including papers from known authors. This presents a challenge for legal readers, as they might evaluate the usefulness of a citation based on its appearance—assuming that a correctly formatted citation from a journal or court they recognize is likely to be valid.
Hallucinations about the facts of cases. Even when a citation is correct, the model might not correctly describe the facts of the case or its legal principles. Sometimes, it may present a plausible but incorrect summary or mix up details from different cases. This type of hallucination poses a risk to legal professionals who rely on accurate case summaries for their research and arguments.
Hallucinations about legal doctrine. In some instances, the model may generate inaccurate or outdated legal doctrines or principles, which can mislead users who rely on the AI-generated content for legal research.
In my own experience, I’ve found that hallucinations are most likely to occur when the model does not have much in its training data that is useful to answer the question. Rather than telling me the training data cannot help answer the question (similar to a “0 results” message in Westlaw or Lexis), the generative AI chatbots seem to just do their best to produce a plausible-looking answer.
This does seem to be what happened to the attorneys in Mata v. Avianca. They did not ask the model to answer a legal question, but instead asked it to craft an argument for their side of the issue. Rather than saying that argument would be unsupported, the model dutifully crafted an argument, and used fictional law since no real law existed.
How are vendors and law firms addressing hallucinations?
Although vendors and firms are often close-lipped about how they have built their products, we can observe a few techniques that they are likely using to limit hallucinations and increase accuracy.
First, most vendors and firms appear to be using some form of retrieval-augmented generation (RAG). RAG combines two processes: information retrieval and text generation. The model takes the user’s question and passes it (perhaps with some modification) to a database. The database results are fed to the model, and the model identifies relevant passages or snippets from the results, and again sends them back into the model as “context” along with the user’s question.
This reduces hallucinations, because the model receives instructions to limit its responses to the source documents it has received from the database. Several vendors and firms have said they are using retrieval-augmented generation to ground their models in real legal sources, including Gunderson, Westlaw, and Casetext.
To enhance the precision of the retrieved documents, some products may also use vector embedding. Vector embedding is a way of representing words, phrases, or even entire documents as numerical vectors. The beauty of this method lies in its ability to identify semantic similarities. So, a query about “contract termination due to breach” might yield results related to “agreement dissolution because of violations”, thanks to the semantic nuances captured in the embeddings. Using vector embedding along with RAG can provide relevant results, while reducing hallucinations.
Another approach vendors can take is to develop specialized models trained on narrower, domain-specific datasets. This can help improve the accuracy and relevance of the AI-generated content, as the models would be better equipped to handle specific legal queries and issues. Focusing on narrower domains can also enable models to develop a deeper understanding of the relevant legal concepts and terminology. This does not appear to be what law firms or vendors are doing at this point, based on the way they are talking about their products, but there are law-specific data pools becoming available so we may see this soon.
Finally, vendors may fine-tune their models by providing human feedback on responses, either in-house or through user feedback. By providing users with the ability to flag and report hallucinations, vendors can collect valuable information to refine and retrain their models. This constant feedback mechanism can help the AI learn from its mistakes and improve over time, ultimately reducing the occurrence of hallucinations.
So, hallucinations are fixed?
Even though vendors and firms are addressing hallucinations with technical solutions, it does not necessarily mean that the problem is solved. Rather, it may be that our our quality control methods will shift.
For example, instead of wasting time checking each citation to see if it exists, we can be fairly sure that the cases produced by legal research generative AI tools do exist, since they are found in the vendor’s existing database of case law. We can also be fairly sure that the language they quote from the case is accurate. What may be less certain is whether the quoted portions are the best portions of the case and whether the summary reflects all relevant information from the case. This will require some assessment of the various vendor tools.
We will also need to pay close attention to the databases results that are fed into retrieval augmented generation. If those results don’t reflect the full universe of relevant cases, or contain material that is not authoritative, then the answer generated from those results will be incomplete. Think of running an initial Westlaw search, getting 20 pretty good results, and then basing your answer only on those 20 results. For some questions (and searches), that would be sufficient, but for more complicated issues, you may need to run multiple searches, with different strategies, to get what you want.
To be fair, the products do appear to be running multiple searches. When I attended the rash of AI presentations at AALL over the summer, I asked Jeff Pfeiffer of Lexis how he could be sure that the model had all relevant results, and he mentioned that the model sends many, many searches to the database not just one. Which does give some comfort, but leads me to the next point of quality control.
We will want to have some insight into the searches that are being run, so that we can verify that they are asking the right questions. From the demos I’ve seen of CoCounsel and Lexis+ AI, this is not currently a feature. But it could be. For example, the AI assistant from scite (an academic research tool) sends searches to academic research databases and (seemingly using RAG and other techniques to analyze the search results) produces an answer. They also give a mini-research trail, showing the searches that are being run against the database and then allowing you to adjust if that’s not what you wanted.
scite AI Assistant Sample ResultssCcite AI Assistant Settings
Are there uses for generative AI where the risks presented by hallucinations are lessened?
The other good news is that there are plenty of tasks we can give generative AI for which hallucinations are less of an issue. For example, CoCounsel has several other “skills” that do not depend upon accuracy of legal research, but are instead ways of working with and transforming documents that you provide to the tool.
Similarly, even working with a generally applicable tool such as ChatGPT, there are many applications that do not require precise legal accuracy. There are two rules of thumb I like to keep in mind when thinking about tasks to give to ChatGPT: (1) could this information be found via Google? and (2) is a somewhat average answer ok? (As one commentator memorably put it “Because [LLMs] work by predicting the most statistically likely word in a sentence, they churn out average content by design.”)
For most legal research questions, we could not find an answer using Google, which is why we turn to Westlaw or Lexis. But if we just need someone to explain the elements of breach of contract to us, or come up with hypotheticals to test our knowledge, it’s quite likely that content like that has appeared on the internet, and ChatGPT can generate something helpful.
Similarly, for many legal research questions, an average answer would not work, and we may need to be more in-depth in our answers. But for other tasks, an average answer is just fine. For example, if you need help coming up with an outline or an initial draft for a paper, there are likely hundreds of samples in the data set, and there is no need to reinvent the wheel, so ChatGPT or a similar product would work well.
What’s next?
In the coming months, as legal research generative AI products become increasingly available, librarians will need to adapt to develop methods for assessing accuracy. Currently, there appear to be no benchmarks to compare hallucinations across platforms. Knowing librarians, that won’t be the case for long, at least with respect to legal research.
Further reading
If you want to learn more about how retrieval augmented generation and vector embedding work within the context of generative AI, check out some of these sources:
Somewhat recently, during a webinar on generative AI, when the speaker Joe Regalia mentioned “flu snot” prompting, I was momentarily confused. What was that? Flu shot? Flu snot? I rewound a couple of times until I figured out he was saying “few shot” prompting. Looking for some examples of few-shot learning in the legal research/writing context, I Googled around and found his excellent article entitled ChatGPT and Legal Writing: The Perfect Union on the write.law website.
What Exactly is Few Shot Prompting?
It turns out that few-shot prompting is a technique for improving the performance of chatbots like ChatGPT by supplying a small set of examples (a few!) to guide its answers. This involves offering the AI several prompts with corresponding ideal responses, allowing it to generate more targeted and customized outputs. The purpose of this approach is to provide ChatGPT (or other generative AI) with explicit examples that reflect your desired tone, style, or level of detail.
Legal Research/Writing Prompting Advice from write.law
To learn more, I turned to Regalia’s detailed article which provides his comprehensive insights into legal research/writing prompts and illuminates various prompting strategies, including:
Zero Shot Learning/Prompting
This pertains to a language model’s ability to tackle a novel task, relying on its linguistic comprehension and pre-training insights. GPT excels in zero-shot tasks, attributed to its robust capabilities. (Perhaps unsurprisingly, one-shot learning involves providing the system with just one example.)
Few-Shot Learning/Prompting
Few-shot learning involves feeding GPT several illustrative prompts and responses that echo your desired output. These guiding examples wield more influence than mere parameters because they offer GPT a clear directive of your expectations. Even a singular example can be transformative in guiding its responses.
As an example of few-shot learning, he explains that if you want ChatGPT to improve verbs in your sentence, you can supply a few examples in a prompt like the following:
My sentence: The court issued a ruling on the motion.Better sentence: The court ruled on the motion. My sentence: The deadline was not met by the lawyers. Better sentence: The lawyers missed the deadline. My sentence: The court’s ruling is not released. [now enter the sentence you actually want to improve, hit enter, and GPT will take over] [GPT’s response] Better sentence: The court has not ruled yet [usually a much-improved version, but you may need to follow up with GPT a few times to get great results like this]
And Much More Prompting Advice!
Regalia’s website offers an abundance of insights as you can see from the extensive list of topics covered in his article. Get background information on how geneative AI system operate, and dive into subjects like chain of thought prompting, assigning roles to ChatGPT, using parameters, and much more.
What Legal Writers Need to Know About GPT
Chat GPT’s Strengths Out of the Box
Chat GPTs Current Weaknesses and Limitations
Getting Started with Chat GPT
Prompt Engineering for Legal Writers
Legal Writing Prompts You Can Use with GPT
Using GPT to Improve Your Writing
More GPT Legal Writing Examples for Inspiration
Key GPT Terms to Know
Final Thoughts for GPT and Legal Writers
Experimenting With Few-Shot Prompting Before I Knew the Name
Back in June 2023, I first started dabbling in few-shot prompting without even knowing it had a name, after I came across a Forbes article titled Train ChatGPT To Write Like You In 5 Easy Steps. Intrigued, I wondered if I could use this technique to easily generate a profusion of blog posts in my own personal writing style!!
I followed the article’s instructions, copying and pasting a few of my favorite blog posts into ChatGPT to show it the tone and patterns in my writing that I wanted it to emulate. The result was interesting, but in my humble opinion, the chatty chatbot failed to pick up on my convoluted conversational (and to me, rather humorous) approach. They say that getting good results from generative AI is an iterative process, so I repeatedly tried to convey that I am funny using a paragraph from a blog post:
Prompt: Further information. I try to be funny. Here is an example:During a text exchange with my sister complaining about our family traits, I unthinkingly quipped, “You can’t take the I out of inertia.” Lurching sideways in my chair, I excitedly wondered if this was only an appropriate new motto for the imaginary Gotschall family crest, or whether I had finally spontaneously coined a new pithy saying!? Many times have I Googled, hoping in vain, and vainly hoping, to have hit upon a word combo unheard of in Internet history and clever/pithy enough to be considered a saying, only to find that there’s nothing new under the virtual sun.
Fail! Sadly, my efforts were to no avail, it just didn’t sound much like me… (However, that didn’t stop me from asking ChatGPT to write a conclusion for this blog post!)
Conclusion
For those keen to delve deeper into the intricacies of legal research, writing, and the intersection with AI, checking out the resources on write.law is a must. The platform offers a wealth of information, expert insights, and practical advice that can be immensely valuable for both novices and seasoned professionals.
Some of you reading this may be skeptical that these new AI technologies are 1) within your skillset and/or 2) worth the effort to learn. I’m the congenital optimist who is here to win you over. These tools are on the verge of revolutionizing the field of law (once they get out of their prototype phase) and I can’t think of a better group of people on law school campuses, in government organizations, and in law firms to evaluate and implement these technologies. Law Librarians (traditionally) have two crucial skill sets that make us well-suited to take the lead here:
We understand how information is organized and
We understand how information is used in the research and practice of law.
Many of the tips that David Shapiro provides in that video for people creating custom LLMs will be absolutely obvious to law librarians because we live and breathe these every day at our jobs: taxonomies, data organization, “source of truth,” etc. Whether in the tech services department or research instruction, we are well-versed in organizing and finding information.
We already have many of the data structures in place that could be easily used by these technologies. Besides constructing the initial models, our role will be pivotal in continuously updating and assessing their effectiveness. Moreover, we will provide vital guidance on the proper utilization of these tools.
Does this list look like something your Technical Services department does? Can you think of anyone else in your organization who would be better at making knowledge graphs, indexes, or tables of contents for legal materials? Who would be better suited than your Research and Instruction team to teach newcomers how to interact with these tools to get the information that they need? Who in your organization is best positioned to teach (or already teaches) information literacy? I would argue that nobody can do it better than law librarians (not even computer science people).
Now What?
Let’s mobilize a push to collaborate on these tools. We need to get groups of law librarians together who are interested in rolling up their sleeves and digging into the nitty-gritty of creating, auditing, and using LLMs. I am a member of LIT-SIS in AALL and maybe we need a special caucus to address this specific technology. Additionally, we can get consortiums of schools together in each state to develop our own LLMs – outside of the subscription-based products that will roll out for Lexis and Westlaw. Anything we build ourselves will have the needs of our community at the forefront. We can build in all of the transparency, privacy, and accuracy that may be lacking in commercial models. Schools can build tools that would not be commercially viable at firms. Firms and courts could build specialized tools to achieve their unique workflows. It opens up many options that are not available if we’re stuck with the one-size-fits-all nature of Lexis and Westlaw subscriptions.
This is an open-source model that is close to competing with GPT4 (ChatGPT’s underlying model). There are many of these and new models show up every day.
There are many options to create, train, and locally run custom LLMs as long as you have the data. As David Shapiro said in the video, “data is the oil of the information age” and law libraries are deep wells of the type of data that could be used to accurately train these services. Additionally, when you are locally hosting an LLM many of the concerns surrounding privacy, permissions, and student data completely evaporate because you are in control of what information is being sent and stored.
To do all of this, we need organization, collaboration, and funding. Individually this could be difficult but if we band together in consortium, we can get a lot done.
Students
Students are an incredible resource in this area. Many of them come to law school with computer science and data science backgrounds and can help with the creation and development of these models. They need mentors and organizers to help focus their efforts, provide resources, and nurture their creativity. In addition, they provide a deep reservoir of diverse voices and experiences that may not occur to people who have spent decades in academia, the public sector, or law firms. We can bring in students to have competitions to create their own LLM apps for law practice and access to justice initiatives. We can fund fellowships to do work at schools, courts, and firms. We can bring them under our wing to usher in the next generation of tech-savvy law librarians. We can leverage the excitement and energy associated with these new tools to attract new talent into our field – I skimmed TikTok and the #ChatGPT hashtag as around 7.7 billion views. To do that, we need to brainstorm together so that we can get these programs in place.
In Sum
As the torchbearers in this promising venture, it’s time for us, the law librarians, to step up and show the world our unmatched prowess in harnessing the potential of LLMs in law, weaving our expert knowledge in information science, law, and emerging technology. Let us band together, utilizing the rich data reserves at our disposal, and carve out a future where legal technology is not just efficient and transparent, but also a collaborative masterpiece fostered by our relentless pursuit of innovation and excellence.