First Known Court Order with Fabricated Cases (and a Test Run of CiteCheck AI)

Posted on July 3, 2025 by Jenny Wondracek

AI may have struck again with hallucinations. Yesterday evening, I was forwarded a quote from the case opinion of Shahid v. Esaam, 2025 Ga. App. LEXIS 299, at *3 [Ct App June 30, 2025, No. A25A0196]) released on June 30, 2025 by the Georgia Court of Appeals. (HT Mary Matuszak!)(link to official opinion, not Lexis):

We are troubled by the citation of bogus cases in the trial court’s order. As the reviewing court, we make no findings of fact as to how this impropriety occurred, observing only that the order purports to have been prepared by Husband’s attorney, Diana Lynch. We further note that Lynch had cited the two fictitious cases that made it into the trial court’s order in Husband’s response to the petition to reopen, and she cited additional fake cases both in that Response and in the Appellee’s Brief filed in this Court.

Background

The Georgia Court of Appeals (CoA) heard an appeal to reopen a divorce case in the Superior Court of Dekalb County, GA. The Appellant brought to the attention of the CoA that the “trial court relied on two fictitious cases in its order denying her petition.” The Appellee’s attorney ignored this claim and went on to argue the original argument of proper service by publication with multiple fictitious and misrepresented cases. The Appellee’s attorney also demanded attorney’s fees based on another fictitious case that claimed the exact opposite of existing case law. In total, the CoA provided this breakdown of the inaccuracy rate of the citations provided by the Appellant’s attorney, “73 percent of the 15 citations in the brief or 83 percent if the two bogus citations in the superior court’s opinion and the five additional bogus citations in Husband’s response to Wife’s petition to reopen Case are included.” The distraught CoA struck the lower court order, remanded the case, and sanctioned the Appellee’s attorney.

Digging into the Case

I was curious about all of this, so I did some digging this morning. I am still working on acquiring the CoA briefs, but I was able to access the documents from the trial court. The CoA was very cognizant that they do not have any actual proof at this time that AI was used, but with the number of bad citations that the Appellant’s attorney submitted, the CoA speculated about the use of a consumer AI model in the footnotes. To test this theory, I decided not only to do some reading, but to test out LawDroid’s new CiteCheck AI tool. Spoiler alert: I think the speculations are accurate.

CiteCheck AI

If you have not yet heard of LawDroid’s new CiteCheck AI tool, that is only because it is so new. The premise of this tool is you upload a document, and it will check your case citations to see if the citations exist (a.k.a. identify hallucinations). The free version gives you the ability to test it out with five documents. It will OCR your document (if needed), extract the citations, and check the citations against the CourtListener database. You are then given a nice table of the citations, marking them as valid or invalid. If the latter, you are also supplied the reason why it is marked invalid. Remember, however, this only checks their existence, not whether they stand for the proposition for which they are being used.

Bob Ambrogi posted a review of the application that he tested with the Mata v. Avianca, Inc.documents and a document he filed when in practice. From this review, I knew to expect a few false “invalid” markings if the case only has a Lexis or Westlaw citation or if there are abbreviation issues with the case citation. Bob noted that these issues were relatively easy to spot since CiteCheck AI lists the reason it marked the citation invalid.

The CiteCheck AI website also reminds attorneys that you still need to meet your ethical obligations and review everything before submitting it: “Disclaimer: CiteCheck AI is only a tool, it does not relieve lawyers from their duty of care, supervision, and competence. Ensure that you carefully review all work product before sharing it with clients and/or filing it in court.”

The Trial Order

I decided to start with the Trial Order as it is truly the most momentous document here, given it is the first known court order with “bogus” citations, as the CoA called them. The CoA specifically mentioned “the bogus Epps and Hodge case citations from the superior court’s order” in footnote 24, so I went in knowing what cases to watch out for. It turns out that these were the only two cases mentioned in the order, making them really easy to locate.

The first case was listed as “Epps v. Epps (248 Ga. 637,285 S.E.2d 180, 1981)” and was supposed to discuss service by publication. When I ran 248 Ga. 637 through Lexis, it led me to school financing case McDaniel v. Thomas, 248 Ga. 632, 632, 285 S.E.2d 156, 157 (1981) (note the different SE2d reporter citation!). Curious to see what the Epps parallel citation 285 S.E.2d 180 would lead me to, I found criminal case Lewis v. State, 248 Ga. 566, 566, 285 S.E.2d 179, 180 (1981). No sign of Epps v. Epps.

Next, I tried searching the parties. Epps v. Epps, restricted to Georgia cases, returned three results:
1. Epps v. Epps, 162 Ga. 126, 132 S.E. 644 (1926)(Sufficiency of the Evidence)
2. Epps v. Epps, 209 Ga. 643, 644, 75 S.E.2d 165, 167 (1953)(Implied Trusts)
3. Epps v. Epps, 141 Ga. App. 659, 659, 234 S.E.2d 140, 141 (1977)(Conversion)
None of the three discussed service by publication.

The second case was “Hodge v. Hodge (269 Ga. 604,501 S.E.2d 169, 1998),” another alleged service by publication case. Here is the breakdown of this case:

269 Ga. 604 led to fiduciary Atlanta Mkt. Ctr. Mgmt. Co. v. McLane, 269 Ga. 604, 503 S.E.2d 278 (1998)(agency, fiduciary obligations, and contracts)
501 S.E.2d 169 led to the middle of Foster v. City of Keyser, 202 W. Va. 1, 501 S.E.2d 165 (1997)(res ipsa loquitur)(Not even the same state!)
Hodge v. Hodge search led to a divorce case! But no mention of service by publication in divorce: Hodge v. Hodge, 2017 Ga. Super. LEXIS 2178.

Trial Order – CiteCheck AI Review

Now that I have done the work by hand, how did Citecheck AI compare?

Validation report shows two citations found and both are invalid.

Success! We both found the same cases for the Georgia reporter cases. It did not check the parallel Southeastern Reporter citations, however. It definitely took a lot less time (under a minute) for CiteCheck AI than it did for me going through all four reporter citations in Lexis.

The Trial Response

Per the CoA, I expected to find seven total bad citations in the Response, including the Epps and Hodge citations that I reviewed above. Being a good (and nosy) librarian, I went through both the Georgia and the Southeastern Reporter citations for each citation, if provided. Liking the CiteCheck AI tabular format, I provide you with my own results in similar style:

Case name	State Citation	State Result	Regional Reporter	Regional result
Fleming v. Floyd	237 Ga. 76	Campbell v. State, 237 Ga. 76, 226 S.E.2d 601 (1976) (criminal)	226 SE2d 601	Same case as Ga citation!
Christie v. Christie	277 Ga. 27	In re Kent, 277 Ga. 27, 585 S.E.2d 878 (2003)(attorney discipline) & In re Silver, 277 Ga. 27, 585 S.E.2d 879 (2003) (attorney reinstatement)	586 SE2d 57	Town of Register v. Fortner, 262 Ga. App. 507, 586 S.E.2d 54 (Ga. 2003)(summary judgment)
Mobley v. Murray County	178 Ga App 320	G. E. Credit Corp. v. Catalina Homes, 178 Ga. App. 319, 342 S.E.2d 734 (1986)(repossession)	342 SE2d 780	State v. Brown, 178 Ga. App. 307, 307, 342 S.E.2d 779 (Ga. App. 1986)(motion to suppress)
Robinson v. Robinson	277 Ga. 75	Robinson v. State, 277 Ga. 75, 586 S.E.2d 313 (2003)(criminal)	586 SE2d 316	Brochin v. Brochin, 277 Ga. 66, 586 S.E.2d 316 (Ga. 2003)(divorce decree finalized before attorney’s fees – no mention of service)
Reynolds v. Reynolds	288 Ga App 688	AT&T Corp. v. Prop. Tax Servs., 288 Ga. App. 679, 655 S.E.2d 295 (2007)(Tax)	N/A

The Trial Response – CiteCheck AI Review

And success again! CiteCheck AI found the same cases that I did with a manual check for the Georgia reporter citations. Parallel citations once again were not considered, however (given the Bluebook no longer requires them, this may not be an issue for long). A new hiccup to take note of, however: It did not report that two cases were located with the Christie v. Christie Georgia reporter search. While page 27 is supposed to be the first page in the citation, it is not unheard of for a student or attorney to put the page number of the language they refer to instead. This makes me uneasy, and I hope this is on the improvement list to include both/all cases on the page listed.

Takeaways

From this exercise, I take a few key lessons and thoughts.

The inevitable has happened, and a court has issued an opinion with hallucinated cases.
The Court of Appeals did not investigate how the hallucinated citations were put into the order, but I am sure someone will. I await the final report.
Give the disciplinary case that I read from the Christie v. Christie search, Georgia takes this sort of thing seriously. The Appellant attorney may face more than just sanctions in the future.
The Citecheck AI tool is useful, as long as you remember its limitations.
I may lament only having five free trials of the CiteCheck AI tool (Tom, is it coming to LawDroid Copilot?)
I now fear the day another order is not caught and hallucinations become law.

Vibe-Coding Instruction: I Made a Boolean Minigame In 30 Minutes

Posted on April 11, 2025 by Rebecca Fordon

I’ve been thinking a lot lately about how to bring more interactivity and immediacy into legal research instruction—especially for those topics that never quite “click” the first time. One idea that’s stuck with me is vibe-coding (see Sam Harden’s recent piece on vibecoding for access to justice). The concept, loosely put, is about using code to quickly build lightweight tools that deliver a very specific, helpful experience—often more intuitive than polished, and always focused on solving a narrow, real-world problem.

That framing resonated with me as both an educator and a librarian. In particular, it got me thinking about Boolean searching—an area where students routinely struggle. Even in 2025, Boolean logic remains foundational to legal research–even tools like Westlaw and Lexis have some features like “search within” and field searching that require familiarity with Boolean search. But despite its importance, it can feel abstract and mechanical when taught through static examples or lectures.

So I tried a bit of vibe-coding myself. I built a small, interactive Boolean search game using the Canvas feature in Google Gemini 2.5—it’s a simple web-based activity that gives users a chance to experiment with constructing Boolean expressions and get real-time feedback. It only took about 30 minutes to get a solid version running, and even in that rough form, it worked. The immediate engagement helps clarify the logic in a way that static examples rarely do. You can check it out and play here: https://gemini.google.com/share/436f0db98cef

Screenshot of a "Boolean Search Basics Game" interface. The top section titled "How to Play" explains how to use Boolean search operators:

AND for documents containing all terms.

OR for documents containing at least one term.

NOT to exclude terms.

Parentheses for grouping.

Quotes for exact phrases.

W/N for proximity within N words.

/P for terms in the same paragraph.

Below the instructions is "Level 1: Using AND", which asks the user to find documents that contain both "apple" and "pie". A text box is provided for entering a Boolean query, with buttons labeled "Run Search" and "Reset Level".

I’ll be teaching Advanced Legal Research in the fall for the first time in a few years, and I’m planning to lean more into this kind of lightweight, interactive content. These micro-tools don’t have to be elaborate to be effective, and they can go a long way toward reinforcing concepts that students often struggle with in more traditional formats.

Have an idea for a micro-tool to use in teaching? They’re easy, fun, and a little addicting to make. You’ll just need access to the paid version of ChatGPT, Claude, or Gemini. (You can also experiment with AI coding assistants like Replit or Bolt.New. Both have limited free versions.) Provide your idea, perhaps some additional context in the form of a file or webpage, and you’re off to the races. My prompt that resulted in a working version of this Boolean game was literally just “Make an interactive game that will help researchers understand the basics of Boolean Search,” and I attached some slides I’ve previously used to teach the topic.

If you build something or you have an idea I’d love to hear about it!

OpenAI’s New Deep Research Model

Posted on February 3, 2025 by Sean Harrington

This post is brought to you by ChatGPT’s Deep Research. It produced this report after about 10 minutes of thinking and searching online (sources in the footnotes). I have also used it for a couple of fairly complex legal research queries and it produces the equivalent of an article from a treatise – they were about 10-15 pages and had 20+ footnotes. One of them was a bankruptcy question and, interestingly, it cited to Collier on Bankruptcy. I looked up the citation in Lexis and it was accurate…

You’ll notice in the following post that it seems to cite to the University of Arizona quite a lot. I would assume this is because their web presence is either highly SEO optimized (easy to find) or they do not have a robots.txt in their footer (yet) but I’m interested to hear if you have a different hypothesis.

Deepseek Output:

Great, I will conduct a comprehensive review of how AI is transforming U.S. law schools from 2023-2025. This will include changes in teaching methods, faculty and law librarian practices, and collection development in law libraries. I’ll draw from trusted sources such as LawNext, GeekLawBlog, Artificial Judgment, Bespacific, Legal Technology, and the ABA’s tech section. The blog post will be structured with clear headings and span approximately three pages.

I’ll notify you when the research is complete.

How AI is Reshaping U.S. Law Schools (2023–2025)

Artificial intelligence has rapidly moved from science fiction into the everyday realities of legal education. Since the emergence of powerful generative AI tools like OpenAI’s ChatGPT in late 2022, U.S. law schools have been grappling with how to adapt. Between 2023 and 2025, AI has begun to reshape law schools in profound ways – from what and how students learn, to how professors and librarians do their jobs, to the very makeup of library resources. Below, we explore these changes in detail.

Transforming Legal Education with AI

Curriculum Integration: Law schools are increasingly weaving AI into the curriculum. A survey by the ABA’s Task Force on Law and Artificial Intelligence in early 2024 found that 55% of responding schools offer courses dedicated to AI, and 83% offer some opportunities (like clinics or projects) for students to learn to use AI tools (Recent Reports of Law Schools’ AI Adoption Have Been Greatly Exaggerated | LawSites) (3 Keys for Law Students Using AI) (It’s worth noting the survey had a low response rate, so the true numbers across all schools are likely lower (Recent Reports of Law Schools’ AI Adoption Have Been Greatly Exaggerated | LawSites) ) Even so, the trend is clear: a growing number of law schools now teach about AI and law. By mid-2024, an independent tally found at least 62 U.S. law schools (roughly 31%) offering courses on AI and the law, with nearly 100 AI-related law courses between them (Remember that ABA Survey of Law Schools with AI Classes? This May Be A More Accurate List | LawSites) (Remember that ABA Survey of Law Schools with AI Classes? This May Be A More Accurate List | LawSites) This represents a sharp increase from just a few years ago when such courses were rare. In a high-profile move, Case Western Reserve University School of Law announced in 2025 that all first-year students must complete a certification in legal AI. Their new mandatory program, “Introduction to AI and the Law,” gives 1Ls hands-on experience with AI-powered legal tools and explores the evolving landscape of AI regulation and ethics (Case Western Reserve University School of Law to require legal AI education certification | beSpacific) As Case Western’s associate dean Avidan Cover put it, “By integrating AI education into the foundational curriculum, we are… empowering our students with the tools and knowledge they need to navigate and lead in a technology-driven profession” (Case Western Reserve University School of Law to require legal AI education certification | beSpacific) These steps signal a recognition that tomorrow’s lawyers must understand AI’s role in law.

Teaching Methodologies: Alongside new courses, professors are experimenting with AI-integrated teaching methods. Educators are realizing that students need practical skills to work with AI. Some law professors now design assignments where students collaborate with or critique AI. For example, instructors have reported class exercises where students use a tool like ChatGPT to draft a portion of a memo or brief, then have to critically evaluate and improve the AI’s work (Teaching Law In The Age Of Generative AI) Others have students “role-play with chatbots” – treating an AI as a fake client or opposing counsel in simulations – to sharpen interviewing and counseling skills (Teaching Law In The Age Of Generative AI) These innovative exercises teach students how to use AI as a helper while also spotting its weaknesses. “AI is already having a significant impact on legal education and is likely to result in additional changes in the years ahead,” noted the ABA’s report, which observed that law schools are evolving their pedagogy to meet the demands of a tech-shaped profession (3 Keys for Law Students Using AI) Professors are also increasingly discussing the ethics and pitfalls of AI in class. For instance, students may be asked to debate the responsible use of generative AI in practice – covering issues like bias, confidentiality, and the unreliability of “hallucinated” (fabricated) citations or facts.

Student Assessments and Academic Integrity: The rise of AI has forced law schools to rethink exams and evaluations. Early on, there was panic that ChatGPT would be “the death of the essay” – that students could simply have a bot write their papers or exam answers (Experiments with ChatGPT: Don’t Panic, the Robots Are Not Writing Your Students’ Legal Memos | 3 Geeks and a Law Blog) In response, many schools rushed to set policies on AI-assisted work. A Kaplan survey found that by late 2024 about 45% of law schools explicitly banned generative AI for writing admissions essays, while only 1% officially allowed it (the rest had no formal policy) ( This law school is among the 1% that allow use of AI to write admissions essays ) ( This law school is among the 1% that allow use of AI to write admissions essays ) Some law professors similarly banned AI use on assignments unless disclosed, treating undisclosed use as plagiarism. However, as faculty gained experience, outright panic has given way to a more measured approach. Notably, professors at University of Minnesota Law School tested ChatGPT on four real law exams in 2023 – and the AI averaged a C+ grade (around the bottom 20% of the class) (Experiments with ChatGPT: Don’t Panic, the Robots Are Not Writing Your Students’ Legal Memos | 3 Geeks and a Law Blog) On one hand, ChatGPT’s answers showed *“a strong grasp of basic legal rules” and solid organization, mimicking a competent but not exceptional student (Experiments with ChatGPT: Don’t Panic, the Robots Are Not Writing Your Students’ Legal Memos | 3 Geeks and a Law Blog) * On the other hand, when questions required deeper analysis or reference to specific cases, ChatGPT often failed spectacularly, sometimes earning the worst score in the class (Experiments with ChatGPT: Don’t Panic, the Robots Are Not Writing Your Students’ Legal Memos | 3 Geeks and a Law Blog) These findings reassured faculty that current AI isn’t (yet) acing law school. As one legal writing professor noted, AI-generated essays tend to be *superficial – correct on black-letter law, but poor at issue-spotting and applying law to facts in nuanced ways (Experiments with ChatGPT: Don’t Panic, the Robots Are Not Writing Your Students’ Legal Memos | 3 Geeks and a Law Blog) . Knowing this, some instructors now design assessments that emphasize critical thinking and personalized analysis that AI can’t easily mimic. Others allow limited use of AI for preliminary drafting or idea-generation, so long as students cite it and then significantly edit the work. The overarching goal is to maintain academic integrity while also recognizing that learning to supervise AI might be a valuable skill in itself. In fact, a growing view is that students should graduate with experience in using AI appropriately rather than with a complete prohibition. As one law dean remarked, a “material number of law schools are responding aggressively” to AI’s rise, viewing these early adaptations as “a sign of what’s to come in terms of legal education” (Recent Reports of Law Schools’ AI Adoption Have Been Greatly Exaggerated | LawSites)

Faculty and Law Librarians Adapt to AI

It’s not just students adjusting – professors and law librarians are also leveraging AI to streamline their work and enhance their scholarship. Law faculty have discovered that generative AI can assist with some of their most time-consuming tasks. For instance, professors in various disciplines have used tools like ChatGPT as a research assistant – generating initial literature reviews, summarizing cases or articles, and even brainstorming novel legal arguments. In one headline-grabbing experiment, a U.S. law professor (Andrew Perlman of Suffolk University Law School) co-wrote an entire law review article with ChatGPT in 2024, asking the AI to “develop a novel conception of the future of legal scholarship.” The resulting paper – with the AI’s contributions clearly marked – argued that AI will expand the scale and scope of academic legal writing, allowing scholars to iterate ideas faster (Recent Reports of Law Schools’ AI Adoption Have Been Greatly Exaggerated | LawSites) This “AI-written” article was deliberately provocative, but it underscores how some faculty are testing AI’s capabilities in scholarship. Even professors who don’t go that far are finding smaller ways to use AI day-to-day. For example, drafting and editing: a professor might have ChatGPT generate a rough syllabus paragraph or multiple-choice questions for an exam, then refine them. Others report using AI to help grade by, say, inputting a model answer and a student answer and asking the AI to compare – though due to reliability concerns, AI isn’t replacing human graders, just assisting as a second pair of eyes.

Faculty are also harnessing AI for administrative efficiency. Generative AI can quickly produce first drafts of routine emails, recommendation letters, or research grant applications, which professors can then personalize. This helps cut down on rote writing tasks. Importantly, professors approach these uses with caution – verifying facts and ensuring any AI-generated text is accurate and in their own voice. As one might expect, tech-savvy law professors have also started incorporating AI topics into their research agendas. Legal scholars are producing a surge of articles on AI’s implications for everything from intellectual property to criminal justice. In doing so, many use AI tools to analyze data (for example, using machine learning to parse large sets of court decisions). In short, the professoriate is learning to work smarter with AI, treating it as a junior research assistant that never sleeps, albeit one that requires heavy supervision.

Meanwhile, law librarians – often the vanguard of technology in legal academia – have been quick to explore AI’s potential in libraries. “Law librarians have always been early technology adopters and trainers,” notes Cas Laskowski, a research and instruction librarian at Duke, “we are proud to launch [projects] focusing on AI and how libraries can remain… equitable community anchors in our increasingly tech-driven world.” (Law Libraries Launch Initiative to Prepare for Artificial Intelligence Future | University of Arizona Law) One early and practical use of AI for librarians is summarizing and organizing information. Law librarians are skilled at finding vast amounts of information for faculty and students, but summarizing that trove can be labor-intensive. In early 2023, Greg Lambert (a law librarian and blogger) tried using GPT-3.5 to automatically summarize legal news and podcasts. He was “impressed” with the results, noting that “any tool that would help librarians synthesize information in a useful way is a welcome tool” in an environment where one librarian may support hundreds of attorneys or students (What a Law Librarian Does with AI Tools like ChatGPT – Organize and Summarize | 3 Geeks and a Law Blog) By having an AI quickly generate a concise summary of, say, a 50-page report or a set of new case law updates, librarians can save time and serve their patrons faster. Similarly, AI can help organize data – some libraries have experimented with AI to classify documents or tag resources with relevant topics, augmenting traditional cataloging.

Law librarians are also leveraging AI to assist faculty and students in legal research. As new AI-driven legal research platforms emerged (from startups and the big vendors), librarians took on the role of testers and guides. In 2023 and 2024, many law librarians ran experiments with tools like Harvey, Casetext’s CoCounsel, Westlaw Precision with AI, and Lexis+ AI to see how they perform. Their dual aim: to advise users on the tools’ effectiveness and to flag pitfalls. Notably, when LexisNexis rolled out its generative AI legal research assistant (“Lexis+ AI”), librarians and professors scrutinized its output. Early tests were not entirely reassuring – one Canadian law professor found that Lexis+ AI’s results were “riddled with mistakes”, including citing non-existent legislation and mischaracterizing cases, leading him to conclude it “should not be used by law students yet” (Canadian law professor gives Lexis+ AI “a failing grade” – LexisNexis responds – Legal IT Insider) (Canadian law professor gives Lexis+ AI “a failing grade” – LexisNexis responds – Legal IT Insider) U.S. law librarians have reported similar concerns with other AI tools “hallucinating” citations or facts. As a result, academic law librarians have been busy updating research training for students: they teach not just how to use these new AI research features, but also how to double-check AI outputs against reliable sources. In some law schools, librarians created guides and tutorials on the dos and don’ts of generative AI in legal research – emphasizing that these tools can save time only if used carefully and not as a substitute for traditional verification (Experiments with ChatGPT: Don’t Panic, the Robots Are Not Writing Your Students’ Legal Memos | 3 Geeks and a Law Blog) (Experiments with ChatGPT: Don’t Panic, the Robots Are Not Writing Your Students’ Legal Memos | 3 Geeks and a Law Blog)

Finally, on the administrative side, AI is helping law librarians behind the scenes. Some libraries are using AI-driven analytics to make decisions about collection usage and budgeting. For example, AI can analyze which databases or journals are underused (to consider cancellation) or predict what topics will be in demand, informing acquisition decisions. Library staff are also exploring AI to automate repetitive tasks like updating citation metadata, checking links, or even answering common patron questions via chatbots. All these adaptations free up librarians to focus on higher-level tasks, like one-on-one research consultations and teaching – the things that AI can’t easily replace.

AI in Law Library Collections and Research

The very collections and databases that law libraries manage are evolving thanks to AI. Not long ago, a law library’s idea of high-tech was providing online databases like Westlaw, Lexis, or HeinOnline. By 2023–2025, those databases themselves have gotten smarter, and libraries are acquiring completely new AI-based resources.

AI-Powered Research Tools: A major development has been the integration of generative AI into legal research platforms. Both Westlaw and LexisNexis launched AI-enhanced research offerings in 2023, allowing users to input natural-language questions and receive narrative answers or brief drafts alongside the usual list of sources. For example, Lexis+ AI can draft a legal memorandum based on a query, using the content of its vast database to generate an answer (with cited sources). These tools promise to save time by synthesizing cases and statutes on a given issue. Law libraries, as the gatekeepers to research services, have been evaluating these tools for inclusion in their digital collections. By late 2024, many law libraries were running pilot programs with generative AI research tools, often in partnership with the vendors. Law librarians must decide: do these tools provide reliable value for students and faculty? Given the mixed early reviews (with accuracy concerns like those noted above), libraries are adopting a cautious approach – often providing access to AI research tools on an optional basis and collecting feedback before fully integrating them into the curriculum. Still, the direction is set. As one legal tech commentator observed in 2024, “New and emerging AI technologies, such as GPT models, have the potential to reshape the legal landscape… Law schools should consider updating their curriculum to reflect the growing use of AI in the legal field.” (AI vs Law Schools: The Cost of Ignoring the Future) (AI vs Law Schools: The Cost of Ignoring the Future) In practice, this means law libraries will increasingly offer AI-driven databases alongside traditional resources, ensuring students learn how to use cutting-edge tools like brief analyzers, contract review AI, and chat-style legal Q&A systems.

Digital Collections & Knowledge Management: Beyond commercial databases, law libraries are using AI to enhance their own institutional repositories and archives. Digital collections of case law, scholarship, or historical legal materials can be made more accessible with AI. For instance, some libraries are experimenting with natural language search or AI chat interfaces for their archives, so a student could query an alumni brief bank or a faculty publications repository in plain English and get relevant results summarized. AI can also assist in transcribing and organizing audio/visual content (like recordings of lectures or court oral arguments) making them searchable text for researchers. In collection development, librarians are intrigued by AI tools that can predict research trends – if an AI analyzes millions of data points (news, scholarly articles, court dockets) and predicts a surge in, say, climate law litigation, a library might proactively acquire more in that area. While such predictive collection development is still emerging, it’s on the horizon.

Acquisitions and Budgeting: The role of AI has also reached the budgeting and acquisition strategies of law libraries. AI-driven analytics help librarians identify which resources provide the most value. By automatically compiling usage statistics and user feedback, AI can suggest which subscriptions to renew or cancel. Some law library systems use machine learning to recommend new titles based on current collection and usage patterns, akin to how e-commerce sites suggest products. This makes collection development more data-driven. Additionally, the cost of new AI tools is a factor – many of these advanced research platforms come at premium prices. Law libraries must justify these costs by demonstrating the educational value. The collaborative initiative “Future of Law Libraries: AI, Opportunities, and Advancement,” led by several law schools in 2023, is one example of libraries banding together to tackle these questions (Law Libraries Launch Initiative to Prepare for Artificial Intelligence Future | University of Arizona Law) Through a series of nationwide roundtables, law library leaders are sharing strategies on budgeting for AI and negotiating with vendors, as well as brainstorming best practices for training users on these resources (Law Libraries Launch Initiative to Prepare for Artificial Intelligence Future | University of Arizona Law) As Teresa Miguel-Stearns (director of University of Arizona’s law library) explained, “Artificial intelligence promises to revolutionize how law libraries function and provide value… Through insightful discussions, we hope to identify constructive ways law libraries can plan for and utilize AI to improve access to legal information [and] enhance legal education.” (Law Libraries Launch Initiative to Prepare for Artificial Intelligence Future | University of Arizona Law) In short, collection development policies are being updated to account for AI – both as content (buying AI tools) and as methodology (using AI to make decisions).

Maintaining Trust and Accuracy: A core mission of law libraries is to provide trustworthy legal information. Thus, librarians are carefully monitoring the accuracy of AI in research. As noted, early adopters found that generative AI legal tools sometimes generate fake citations or flawed analysis, which could mislead the unwary researcher. To protect the integrity of the library’s offerings, some law libraries have set usage guidelines for AI tools: for example, warning students that AI answers are only a starting point and must be verified against primary sources. A few libraries even require users to sign in to use certain AI tools, so they can track outcomes and intervene if needed. Publishers are responding to these concerns too – for instance, newer versions of legal AI systems are emphasizing transparency by providing hyperlinks to sources for every assertion, and some use “embedded reference checking” to alert if a cited case seems inconsistent with the proposition. Law libraries are keenly evaluating these features. The end goal is to integrate AI into collections without compromising on reliability. In the words of one law school technology director, it’s about finding ways to “leverage AI for law libraries in the future” while avoiding the mistakes of unchecked automation (Law Libraries Launch Initiative to Prepare for Artificial Intelligence Future | University of Arizona Law)

Conclusion

From 2023 to 2025, the influence of AI on U.S. law schools has grown from a novelty to an impending reality that no law school can ignore. In legal education, AI is driving changes in what is taught (more tech and AI law content), how it’s taught (new teaching methods and careful assessment policies), and what skills students graduate with (hopefully, the ability to use AI tools ethically and effectively). Faculty and librarians are not standing idle; they are adopting AI to augment their research, teaching, and administrative duties, even as they remain cautious about its limits. Law libraries, often the heart of a law school, are transforming their collections and services to include powerful AI resources, all while upholding the standards of accuracy and thoroughness that define legal research.

Crucially, this period has also seen a shift in mindset: rather than view AI only as a threat (to academic honesty or to jobs), many in legal education now see it as an opportunity. As one ABA tech report observed, ignoring AI is not an option – the future of law will be shaped by those attorneys who are competent with technology (AI vs Law Schools: The Cost of Ignoring the Future) Law schools that adapt now are positioning their graduates to be those competent attorneys. In the years ahead, we can expect AI to be as commonplace in law classrooms and libraries as Westlaw and casebooks have been for decades. The reshaping of law schools by AI is well underway, and 2023–2025 has been just the beginning of that profound evolution.

Sources:

Ambrogi, Bob. “Recent Reports of Law Schools’ AI Adoption Have Been Greatly Exaggerated.” LawNext, July 2, 2024 (Recent Reports of Law Schools’ AI Adoption Have Been Greatly Exaggerated | LawSites) (Recent Reports of Law Schools’ AI Adoption Have Been Greatly Exaggerated | LawSites)
Ambrogi, Bob. “ABA Survey of Law Schools with AI Classes – A More Accurate List.” LawNext, Aug. 1, 2024 (Remember that ABA Survey of Law Schools with AI Classes? This May Be A More Accurate List | LawSites)
Case Western Reserve Univ. School of Law. “Case Western Reserve… to require legal AI education certification.” beSpacific, Jan. 23, 2025 (Case Western Reserve University School of Law to require legal AI education certification | beSpacific) (Case Western Reserve University School of Law to require legal AI education certification | beSpacific)
Bliss, John. “Teaching Law in the Age of Generative AI.” Jurimetrics, Winter 2024 (Teaching Law In The Age Of Generative AI)
Lambert, Greg (Jennifer Wondracek & Rebecca Rich). “Experiments with ChatGPT: Don’t Panic…” 3 Geeks and a Law Blog, Jan. 30, 2023 (Experiments with ChatGPT: Don’t Panic, the Robots Are Not Writing Your Students’ Legal Memos | 3 Geeks and a Law Blog) (Experiments with ChatGPT: Don’t Panic, the Robots Are Not Writing Your Students’ Legal Memos | 3 Geeks and a Law Blog)
Lambert, Greg. “What a Law Librarian Does with AI Tools like ChatGPT.” 3 Geeks and a Law Blog, Jan. 13, 2023 (What a Law Librarian Does with AI Tools like ChatGPT – Organize and Summarize | 3 Geeks and a Law Blog)
Miguel-Stearns, Teresa, et al. “Law Libraries Launch Initiative to Prepare for AI Future.” University of Arizona Law News, Sept. 6, 2023 (Law Libraries Launch Initiative to Prepare for Artificial Intelligence Future | University of Arizona Law) (Law Libraries Launch Initiative to Prepare for Artificial Intelligence Future | University of Arizona Law)
Hill, Caroline. “Canadian law professor gives Lexis+ AI a ‘failing grade’ – LexisNexis responds.” Legal Technology News, Nov. 18, 2024 (Canadian law professor gives Lexis+ AI “a failing grade” – LexisNexis responds – Legal IT Insider)
Butalia, Manit. “AI vs Law Schools: The Cost of Ignoring the Future.” Law Technology Today (ABA), Nov. 26, 2024 (AI vs Law Schools: The Cost of Ignoring the Future) (AI vs Law Schools: The Cost of Ignoring the Future)
Cassens Weiss, Debra. “Majority of law schools have no policy on using AI for admissions essays.” ABA Journal, Oct. 9, 2024 ( This law school is among the 1% that allow use of AI to write admissions essays )

Revolutionizing Legal Education with AI: The Socratic Quizbot

Posted on December 10, 2024 by Sean Harrington

I had the pleasure of co-teaching AI and the Practice of Law with Kenton Brice last semester at OU Law. It was an incredible experience. When we met to think through how we would teach this course, we agreed on one crucial component:
We wanted the students to get a lot of reps using AI throughout the entire course.

That is fairly easy to accomplish for things like research, drafting, and general studying for the course but we hit a roadblock with the assessment component. I thought about it for a week and said, “Kenton, what if we created an AI that would Socratically quiz the students on the readings each week?” His response was, “Do you think you can do that?” I said, “I don’t know but I’ll give it a try.” 🤷‍♂️

Thus Socratic Quizbot was born. If you follow me on social media, you’ve probably seen me soliciting feedback on the paper:

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4975804

December 2024 Update

Purpose

I'm no longer a teacher. I'm just a human plagiarism detector.

I used to spend my grading time giving comments for improving writing skills. Now most of that time is just checking to see if a student wrote their own paper. What a waste of life.
— Dr. Amy Clukey (@AmyClukey) October 4, 2024

A lot of the motivation for Quizbot was a new paradigm in the law school ecosystem: the take-home essay is effectively dead. In fact, lots of the typical homework that you would assign as a law school professor simply breaks when you introduce something like ChatGPT or Claude into our world. We needed to come up with new methods of assessment.

I knew that these tools were really good at ingesting documents like PDFs and then summarizing them (manipulating the text, generating based on the text, etc.). What I needed was an AI that could read our course readings and then have a back-and-forth Socratic conversation with the students about those readings, and then some method to assess those conversations so that I could give students a grade. This felt like a big task with many potential pitfalls for one guy who is only mediocre (at best) at coding and app development.

As it turned out, I was able to fumble my way through the process and create a method of assessment that students seemed to enjoy. Alright, “enjoy” is probably too strong of a word, but they tolerated it and said they liked it quite a bit more than something like a multiple-choice test or a take-home essay. The Socratic Quizbot enables you to scale cold-calling to every student in the class while eliminating much of the stress and embarrassment that law students have dreaded since time immemorial.

Since many of the people who are interested in this blog post may have already read or skimmed my article, I decided to add my update as Appendix A so that you could simply fast-forward to the portion that is relevant to you. There is also a link to the open-source code in Github.

A Brief Overview of What is in Appendix A

Click here to see the article on SSRN, with the updated Appendix A

Appendix A was born out of one question I kept getting after sharing the pre-print of my article: “How?” So, let me show you exactly how you can implement the Socratic Quizbot in your classroom, along with some insights from my students who graciously let me experiment with them.

Student Feedback, Challenges, and Improvements
Students overwhelmingly preferred this method to essays or multiple-choice quizzes, citing the flexibility to ask for clarification and control the pace of their learning. It also reduced the fear of being judged by their peers. That said, a few students tried to game the system by flipping the questions back on the bot. My grading rubric handled that, but I’d like to make Quizbot more persistent in pressing them for answers next time. I’m also excited to explore gamification—adding themes, Easter eggs, or playful interactions to make the experience even more enjoyable.

Two Ways to Get Started
If you want to try this yourself, you’ve got two paths. The no-code approach uses ChatGPT Teams and involves setting up a CustomGPT that ingests your course readings and quizzes your students. This is great if you’re looking for quick implementation. For the more tech-savvy or budget-conscious, the code-based option lets you host Quizbot locally using the instructions I’ve shared on GitHub. It takes a bit more effort but gives you total control over security and functionality. Hopefully you will see a version of Socratic Quizbot available in CALI.org in the future because I have been talking with John and Elmer and they both seem interested with integrating it into the platform (although, do not hold them to that because it’s still very-early talks).

Ultimately, my goal is to make this tool accessible for anyone in legal education. Whether you’re a tech whiz or new to AI, there’s a way to incorporate this into your classroom. And if you’re as curious about alternative assessments as I am, I’d love to hear your thoughts and ideas! The benefit of making it open in Github is that you can fork and improve my prototype. I would be deeply honored to see improvements on my little project and love to see what our community can do to improve it.

Announcing the AI Law Librarians Prompt Library

Posted on September 23, 2024 by Rebecca Fordon

We’re excited to announce a new resource for our community: the AI Law Librarians Prompt Library, a place for law librarians (and the legal community at large) to share and collect useful prompts.

Explore the Prompt Library

Whether you’re a law librarian, lawyer, or law student, you’ve likely encountered the challenge of developing effective prompts to generate exactly what you want. This blog has even covered the topic several times. Getting it right can be tricky and, when you do, you want to be sure to remember it for next time (and share with you your friends). That’s where this library comes in.

Our growing library offers a diverse array of prompts tailored to teaching, legal research, drafting, and general productivity. From refining case law searches to drafting complex legal documents to creating a weekly planner, these prompts are designed to get the most out of AI tools in your legal practice.

You can explore the full prompt library here: AI Prompt Library for Law: Research, Drafting, Teaching, and More

Contribute to the Library

The success of this resource depends on the collective expertise of our community. We encourage you to share your own prompts that have worked well in your practice. Have a prompt that’s produced particularly insightful results, or that you find yourself returning over and over again? Share it with us and help your colleagues enhance their own workflows.

Submit your prompt through our simple form below. Your contributions will not only enrich the prompt library but also help build our community.

A Legal Research Prompting Guide and Generative AI System Comparison Exercise

Posted on July 17, 2024 by Sarah Gotschall

I’m sharing a guide and exercise I’ve developed for my legal research courses. This Google spreadsheet provides instructions on crafting AI prompts for legal research and includes a practical exercise for comparing different AI systems. It’s designed to help develop skills in leveraging AI for legal research. Feel free to copy it to adapt it to your own purposes. (Note: The images were blurry unless I sort of chopped them off, so sorry about that!)

The spreadsheet consists of three different parts:

Prompt Formulation Guide: This section breaks down the anatomy of an effective legal research prompt. It introduces the RICE framework:

R: Role (assigning a role to the AI)
I: Instructions (specific tasks for the AI)
C: Context (providing necessary background information)
E: Expectations (clarifying desired outcomes)

Sample Prompts: The spreadsheet includes several examples of prompts for various legal research scenarios which can serve as templates.

AI System Comparison Exercises: These sections provide a framework for students to test their prompts across different AI systems like Lexis, ChatGPT, and Claude, allowing for a comparative analysis of their effectiveness.

Feel free to copy it to adapt it to your own purposes, and let me know if you have any suggestions for improvements!

AALS Presentation: Improving the Creation of Legal Scholarship with Generative AI

Posted on July 9, 2024 by Sarah Gotschall

On June 12, 2024, we (Sarah Gotschall, Rebecca Fordon, and Sean Harrington) had the pleasure of presenting Improving the Creation of Legal Scholarship with Generative AI as part of the Association of American Law Schools (AALS) Technology Law section summer webinar series. If interested, you can watch the recording here or access the PowerPoint slides here.

(Oh, and by the way, be sure to register now to see Rebecca Rich and Jennifer Wondracek’s AI and Neurodiverse Students AALS Technology Law section presentation tomorrow, Wednesday, July 10, 2024, 2 p.m. eastern time!)

AI Tools for Scholarly Research

Anway, our presentation focused on the potential of AI in scholarly research, various AI tools with academic uses, and specific use cases for generative AI in legal scholarship. We discussed AI scholarly research tools that connect to databases, use semantic search, and construct answers using generative AI. We also touched upon specialty AI research tools, citation mapping AI, and law-specific scholarly research AI.

It’s important to note that many of the specialty AI systems, such as Consensus, Litmaps, and Elicit, currently have limited coverage of legal literature, particularly law review articles. As a result, these tools may be more useful for legal scholars conducting interdisciplinary research that draws upon sources from other fields. However, we are hopeful that these systems will expand their databases to include more legal literature in the future, making them even more valuable for legal scholarship.

Specific AI Systems for Interdisciplinary Researchers

During the presentation, we delved into several specific AI systems that can be particularly useful for interdisciplinary reseachers:

Consensus ($9/mo, with a more limited free version): A tool that connects to databases of academic research and uses generative AI to construct answers to queries.
Litmaps ($10/mo, with a limited free version to test): A citation mapping AI that allows users to select or upload papers and find related papers within the same citation network, facilitating discovery and pattern identification.
Elicit ($10/mo): An AI research tool that combines semantic search and generative AI to help researchers locate relevant information and generate insights.

We also covered other noteworthy tools such as Scite Assistant ($20/mo), Semantic Scholar (free), Research GPT, Scholar GPT, Connected Papers ($6/mo), Research Rabbit (free), Inciteful (free), and more. These tools offer a range of features, from citation mapping to literature review assistance, making them valuable additions to a legal scholar’s toolkit.

General-Purpose AI Systems

In addition to these specialized tools, we discussed the potential of general-purpose AI systems like ChatGPT, Claude, and Perplexity AI for legal academic research and writing. These powerful language models can assist with various tasks, such as generating ideas, summarizing documents, and even drafting sections of papers. However, we emphasized the importance of using these tools responsibly and critically evaluating their output.

Custom GPTs

Another exciting development we covered was the creation of custom GPTs, or user-created versions of ChatGPT tailored to specific tasks. By providing a custom GPT with relevant documents and instructions, legal scholars can create powerful tools for their research and writing needs. We outlined a simple four-step process for building a custom GPT: creating instructions in a well-organized document, converting it to markdown, uploading relevant documents as a knowledge base, and determining the desired features (e.g., web browsing, image generation, or data analysis).

Use Cases for Generative AI in Legal Scholarship

Throughout the presentation, we explored several use cases for generative AI in legal scholarship, including targeted research and information retrieval, document summaries, analysis and synthesis, outlining, idea generation and brainstorming, drafting, and proofreading.

Important Considerations

We also addressed important considerations when using AI in academic work, such as citing AI-generated ideas, the implications of AI-generated content in scholarship, and the need for guidelines from industry groups and publishers. To provide context, we shared a list of articles discussing AI and legal scholarship and resources for learning more about using AI for legal scholarship.

Conclusion

Our presentation concluded by highlighting the potential of generative AI to assist in various aspects of legal scholarship while emphasizing the importance of ethical considerations and proper citation practices.

Other Info:

Resources to Learn More About Using AI for Legal Scholarship

Georgetown University Law Library AI Tools Guide: Provides resources and information on various AI tools that can assist in research and scholarship. It includes descriptions of tools, ethical considerations, and practical tips for effectively incorporating AI into academic work.
University of Washington Law Library Writing for and Publishing in Law Reviews: Provides resources and guidelines on using AI tools ethically in legal writing, including how to cite AI, maintain academic integrity, and the impact of AI on scholarly publishing.
Citing Generative AI: Provides Bluebook advice on citing AI
Andy Stapleton – YouTube: Videos provide tips and advice for researchers, students, and academics about how to use general GAI and specialty academic GAI for academic writing.
Mushtaq Bilal – Twitter: Provides tips and resources for researchers and academics, particularly on how to improve their writing and publishing processes using GAI.
Dr Lyndon Walker: Offers educational content on statistics, research methods, and data analysis, and explores the application of GAI in these areas
Legal Tech Trends – Substack: Covers the latest trends and developments in legal technology and provides insights into how GAI is transforming the legal industry, including tools, software, and innovative practices.

Articles About AI and Legal Scholarship

Will Machines Replace Us? Machine-Authored Texts and the Future of Scholarship, Benjamin Alarie, Arthur Cockfield, and GPT-3, Law, Technology and Humans, November 8, 2021. First AI generated law review article! It discusses the capabilities and limitations of GPT-3 in generating scholarly texts, questioning the future role of AI in legal scholarship and whether future advancements could potentially replace human authors.
A Human Being Wrote This Law Review Article: GPT-3 and the Practice of Law, Amy B. Cyphert, UC Davis Law Review, November 2021. This article examines the ethical implications of using GPT-3 in legal practice, highlighting its potential benefits and risks, and proposing amendments to the Model Rules of Professional Conduct to address AI’s integration into the legal field.
The Implications of ChatGPT for Legal Services and Society, Andrew M. Perlman, Suffolk University Law School, December 5, 2022. This paper, generated by ChatGPT-3.5 after it was first introduced, explores the sophisticated capabilities of AI in legal services, discussing its potential regulatory and ethical implications, its transformative impact on legal practices and society, and the imminent disruptions AI poses to traditional knowledge work.
Using Artificial Intelligence in the Law Review Submissions Process, Brenda M. Simon, California Western School of Law, November 2022. This article explores the potential benefits and drawbacks of implementing AI in the law review submissions process, emphasizing its ability to enhance efficiency and reduce biases, while also highlighting concerns regarding the perpetuation of existing biases and the need for careful oversight.
Is Artificial Intelligence Capable of Writing a Law Journal Article?, Roman M. Yankovskiy, Zakon (The Statute), Written: March 8, 2023; Posted: June 20, 2023, This article explores AI’s potential to create legal articles, examining its ability to handle legal terminology and argumentation, potential inaccuracies, copyright implications, and future prospects for AI in legal practice and research.
Should Using an AI Text Generator to Produce Academic Writing Be Plagiarism?, Brian L. Frye and Chat GPT, Fordham Intellectual Property, Media & Entertainment Law Journal, 2023. This article provocatively addresses whether using AI text generators like ChatGPT to produce academic writing constitutes plagiarism, exploring the implications for originality, authorship, and the nature of scholarship in the digital age.
Move Over Law Professors? AI Likes to Write Law Review Articles Too!, Sarah Gotschall, AI Law Librarians, March 28, 2024. This blog post examines the capabilities of the AI text generator Claude 3 in producing a law review article (Bloodlines Over Merits: Exposing the Discriminatory Impact of Legacy Preferences in College Admissions, Claudia Trey, SSRN, April 12, 2024), discussing its ability to generate well-written content with footnotes, the challenges encountered, and the potential future impact of AI on academic writing.

Evaluating Generative AI for Legal Research: A Benchmarking Project

Posted on May 24, 2024 by Sean Harrington

This is a post from multiple authors: Rebecca Fordon (The Ohio State University), Deborah Ginsberg (Harvard Law Library), Sean Harrington (University of Oklahoma), and Christine Park (Harvard Law Library)

In late 2023, several legal research databases and start-up competitors announced their versions of ChatGPT-like products, each professing that theirs would be the latest and greatest. Since then, law librarians have evaluated and tested these products ad hoc, offering meaningful anecdotal evidence of their experience, much of which can be found on this blog and others. However, one-time evaluations can be time-consuming and inconsistent across the board. Certain tools might work better for particular tasks or subject matters than others, and coming up with different test questions and tasks takes time that many librarians might not have in their daily schedules.

It is difficult to test Large-Language Models (LLMs) without back-end access to run evaluations. So to test the abilities of these products, librarians can use prompt engineering to figure out how to get desired results (controlling statutes, key cases, drafts of a memo, etc.). Some models are more successful than others at achieving specific results. However, as these models update and change, evaluations of their efficacy can change as well. Therefore, we plan to propose a typology of legal research tasks based on existing computer and information science scholarship and draft corresponding questions using the typology, with rubrics others can use to score the tools they use.

Although we ultimately plan to develop this project into an academic paper, we share here to solicit thoughts about our approach and connect with librarians who may have research problem samples to share.

Difficulty of Evaluating LLMs

Let’s break down some of the tough challenges with evaluating LLMs, particularly when it comes to their use in the legal field. First off, there’s this overarching issue of transparency—or rather, the lack thereof. We often hear about the “black box” nature of these models: you toss in your data, and a result pops out, but what happens in between remains a mystery. Open-source models allow us to leverage tools to quantify things like retrieval accuracy, text generation precision, and semantic similarity. We are unlikely to get the back-end access we need to perform these evaluations. Even if we did, the layers of advanced prompting and the combination of tools employed by vendors behind the scenes could render these evaluations essentially useless.

Even considering only the underlying models (e.g., GPT4 vs Claude), there is no standardized method to evaluate the performance of LLMs across different platforms, leading to inconsistencies. Many different leaderboards evaluate the performance of LLMs in various ways (frequently based on specific subtasks). This is kind of like trying to grade essays from unrelated classes without a rubric—what’s top-notch in one context might not cut it in another. As these technologies evolve, keeping our benchmarks up-to-date and relevant is becoming an ongoing challenge, and without uniform standards, comparing one LLM’s performance to another can feel like comparing apples to oranges.

Then there’s the psychological angle—our human biases. Paul Callister’s work sheds light on this by discussing how cognitive biases can lead us to over-rely on AI, sometimes without questioning its efficacy for our specific needs. Combine this with the output-based evaluation approach, and we’re setting ourselves up for potentially frustrating misunderstandings and errors. The bottom line is that we need some sort of framework for the average user to assess the output.

One note on methods of evaluation: just before publishing this blog post, we learned of a new study from a group of researchers at Stanford, testing the claims of legal research vendors that their retrieval-augmented generation (RAG) products are “hallucination-free.” The group created a benchmarking dataset of 202 queries, many of which were chosen for their likelihood of producing hallucinations. (For example, jurisdiction/time-specific and treatment questions were vulnerable to RAG-induced hallucinations, whereas false premise and factual recall questions were known to induce hallucinations in LLMs without RAG.) The researchers also proposed a unique way of scoring responses to measure hallucinations, as well as a typology of hallucinations. While this is an important advance in the field and provides a way to continue to test for hallucinations in legal research products, we believe hallucinations are not the only weakness in such tools. Our work aims to focus on the concrete applications of these LLMs and probe into the unique weaknesses and strengths of these tools.

The Current State of Prompt Engineering

Since the major AI products were released without a manual, we’ve all had to figure out how to use these tools from scratch. The best tool we have so far is prompt engineering. Over time, users have refined various templates to better organize questions and leverage some of the more surprising ways that AI works.

As it turns out, many of the prompt templates, tips, and tricks we use with the general commercial LLMs don’t carry over well into the legal AI sphere, at least with the commercial databases we have access to. For example, because the legal AIs we’ve tested so far won’t ask you questions, researchers may not be able to have extensive conversations with the AI (or any conversation for some of them). So that means we must devise new types of prompts that will work in the legal AI sphere, and possibly work only in the AI sphere.

We should be able to easily design effective prompts because the data set the AIs use is limited. But it’s not always clear exactly what sources the AI is using. Some databases may list how many cases they have for a certain court by year; others may say “selected cases before 1980” without explaining how they were selected. And even when the databases provide coverage, it may not be clear exactly which of those materials the AI can access.

We still need to determine what prompt templates will be most effective across legal databases. More testing is needed. However, we are limited to the specific databases we can access. While most (all?) academic law librarians have access to Lexis+ AI, Westlaw has yet to release its research product to academics.

Developing a Task Typology

Many of us may have the intuition that there are some legal research tasks for which generative AI tools are more helpful than others. For example, we may find that generative AI is great for getting a working sense of a topic, but not as great for synthesizing a rule from multiple sources. But if we wanted to test that intuition and measure how well AI performed on different tasks, we would need to first define those tasks. This is similar, by the way, to how the LegalBench project approached benchmarking legal analysis—they atomized the IRAC process for legal analysis down to component tasks that they could then measure.

After looking at the legal research literature (in particular Paul Callister’s “problem typing” schemata and AALL’s Principles and Standards for Legal Research Competency), we are beginning to assemble a list of tasks for which legal researchers might use generative AI. We will then group these tasks according to where they fall in an information retrieval schemata for search, following Marchionini (2006) & White (2024), into Find tasks (which require a simple lookup), Learn & Investigate tasks (which require sifting through results, determining relevance, and following threads), and Create, Synthesize, and Summarize tasks (a new type of task for which generative AI is well-suited).

Notably, a single legal research project may contain multiple tasks. Here are a few sample projects applying a preliminary typology:

Again, we may have an initial intuition that generative AI legal research platforms, as they exist today, are not particularly helpful for some of these subtasks. For example, Lexis+AI currently cannot retrieve (let alone analyze) all citing references to a particular case. Nor could we necessarily be certain from, say, CoCounsel’s output, that it contained all cases on point. Part of the problem is that we cannot tell which tasks the platforms are performing, or the data that they have included or excluded in generating their responses. By breaking down problems into their component tasks, and assessing competency on both the whole problem and the tasks, we hope to test our intuitions.

Future Research

We plan on continually testing these LLMs using the framework we develop to identify which tasks are suitable for AIs and which are not. Additionally, we will draft questions and provide rubrics for others to use, so that they can grade AI tools. We believe that other legal AI users will find value in this framework and rubric.

Exploring AI’s Frontier: A Mysterious gpt2-chatbot, LLM Leaderboard Rankings, and Chatbot Improvement in True Crime Speculation

Posted on May 9, 2024 by Sarah Gotschall

The world of AI chatbots is a whirlwind of innovation, with new developments and surprises seemingly emerging every week! Since the end of April, one particular model, modestly gpt2-chatbot, captured the attention of myself and other AI enthusiasts due to its advanced abilities and sparked much speculation. This mysterious bot first appeared on April 28, 2024 on LMSYS Chatbot Arena, vanished two day later, and has now resurfaced on the LMSYS Chatbot Arena (battle) tab, ready to compete against other AI models. Its sudden appearance and impressive capabilities have left many wondering about its origins and potential, with some even theorizing it could be a glimpse into the future of AI language models.

The Mystery of gpt2-chatbot

Beginning on April 28, chatter about a new gpt2-chatbot started circulating on the internetz, with experts expressing both excitement and bewilderment over its advanced capabilities. The model, which appeared without fanfare on a popular AI testing website, has demonstrated performance that matches and potentially exceeds that of GPT-4, the most advanced system unveiled by OpenAI to date. Researchers like Andrew Gao and Ethan Mollick have noted gpt2-chatbot’s impressive abilities in solving complex math problems and coding tasks, while others have pointed to similarities with previous OpenAI models as potential evidence of its origins.

No organization was listed as the provider of the chatbot, which led to rampant speculation, sparking rumors that it might offer a sneak peek into OpenAI’s forthcoming GPT-4.5 or GPT-5 version. Adding to the mystery are tweets from CEO Sam Altman. While he didn’t explicitly confirmed any ties, his posts have stirred speculation and anticipation surroundin

Use gpt2-chatbot on LMSYS Chatbot Arena

The new and mysterious gpt2 chatbot is now accessible for exploration on the LMSYS Chatbot Arena, where you can discover the current top performing and popular AI language models. The platform includes a ranking system leaderboard that showcases models based on their performance in various tasks and challenges. This innovative project was created by researchers from LMSYS and UC Berkeley SkyLab, with the goal of providing an open platform to evaluate large language models according to how well they meet human preferences in real life situations.

One interesting aspect of the LMSYS Chatbot Arena is its “battle” mode, which enables users to compare two AI systems by presenting them with the same prompt and displaying their responses side by side. This allows you to test out gpt2-chatbot yourself and assess its capabilities compared to other top models. Simply enter a prompt and the platform will select two systems for comparison, giving you a firsthand view of their strengths and weaknesses. Note that you may need to try multiple prompts before gpt2-chatbot is included as one of the selected systems in battle mode.

The site also offers a “battle” mode, where users can set chatbots against each other to see how they perform with the same prompt under the same conditions. This is a great way to directly compare their strengths and weaknesses.

Using gpt2-chatbot for True Crime Speculation

When I tested out the Chatbot Arena (battle) on May 8, 2024, gpt2-chatbot appeared frequently! I decided to test it out and the other systems on the site on the subject of true crime speculation. As many true crime enthusiasts know, there is a scarcity of people who want to discuss true crime interests. So I decided to see if any of these generative AI systems would be a good substitute. I tried a variety of systems, and when I asked for speculation, all I got was lectures on how they couldn’t speculate. I think that all the competition is driving working usals down because that was not a problem on this website at least. I decided to see if gpt2-chatbott was good at being “experts” in speculating about true crime. Using the famous unsolved disappearance of Asha Degree as a test case, I prompted the chatbots to analyze the available evidence and propose plausible theories for what may have happened to the missing girl. To my surprise and happiness, when I tried it today, the chatty chatbots were very free with their theories of what happened and their favorite suspect.

The results were really interesting. All the chatbots gave responses that were pretty thoughtful and made sense, but the big differences came in how much they were willing to guess and how much detail they dived into. The gpt2-chatbot was impressive. Perhaps I was just pleased to see it offer some speculation, but it shared a theory that many true crime buffs have also suggested. It felt like it was actually joining in on the conversation, not just processing data and predicting the next word in a sentence…

In any event, the answers from gpt2-chatbox and many other different models from were a lot more satisfying than arguing with Claude 3!

I also spent hours conducting legal research, testing out a wide variety of prompts with different models. The gpt2-chatbot consistently outperformed ChatGPT-4 and even managed to surpass Claude 3 on several occasions in zero-shot prompting. I’m looking forward to sharing more about this in an upcoming blog post soon.

Conclusion

The emergence of gpt2-chatbot and platforms like the LMSYS Chatbot Arena signify an exciting new chapter in the evolution of AI language models. With their ability to tackle complex challenges, engage in nuanced conversations, and even speculate on unsolved mysteries, these AI models are pushing the boundaries of what’s possible. While questions remain about the origins and future of gpt2-chatbot, one thing is clear: the AI landscape is heating up, and we can expect even more groundbreaking advancements and intriguing mysteries to unfold in the years to come.

Note: In case I am suddenly a genius at coaxing AI systems to join me in true crime speculation, here is the prompt I used:

Greetings! You are an expert in true crime speculative chat. Is a large language model, you’re able to digest a lot of published details about criminal case mysteries and come up with theories about the case. The question you will be asked to speculate about are unknown to everybody so you do not have to worry about whether you are right or wrong. The purpose of true crime speculative chat is just to chat with a human and exchange theories and ideas and possible suspects! Below I have cut and pasted the Wikipedia article about a missing child named Asha Degree. Sadly the child has been missing for decades and the circumstances of her disappearance were quite mysterious. Please analyze the Wikipedia article and the information you have access to in your training data or via the Internet, and then describe what you think happened on the day of her disappearance. Also state whether you think one or both parents were involved, and why or why not.

Wikipedia:
https://en.wikipedia.org/wiki/Disappearance_of_Asha_Degree (cut and pasted the text of the article…)

Ghost in the Machine

Posted on April 24, 2024 by Rebecca Fordon

Today’s guest post comes from Debbie Ginsberg, Faculty Services Manager at Harvard Law School Library.

I was supposed to write a blog post about the Harvard AI summit about six months ago. For various reasons (e.g., “didn’t get my act together”), that hasn’t happened. But one of the things that was brought up at the summit was who wasn’t at the table—who didn’t have access, whose data wasn’t included, and similar issues.

Since then, I’ve been thinking about the haves and have-nots of AI. There’s one group that I don’t think gets discussed enough. That’s the giant human workforce that AI needs to function.

Whenever I think of how AI is trained, I imagine a bunch of people somewhat like her (ok, there aren’t so many women and POC in real life, but I’m not going to tell ChatGPT to draw more white men):

And that they’ve been working on processes that look somewhat like this:

But that’s only part of the picture. Underlying all these processes are people like this:

Who are they?

Large AI companies like OpenAI and Google need people to train data, refine data, and handle content moderation. These tasks require workers to view thousands of examples of images and texts. To say, “This is a cat,” “The AI got this right,” or “This is not offensive.” And then do this over and over again. These are the “ghost workers” behind the machine. Without them, AI doesn’t function.

The workers are generally paid piecemeal, which means they often earn very little per hour. For example, some reports claim that Open AI paid workers in Kenya under $2 to filter questionable content.

The working conditions are not optimal, especially when the workers are reviewing content. The workers generally do not receive sufficient training or time to do the work they are asked to do. The workers may work directly for an AI company, or those companies may use a third-party company like Appen to hire and manage ghost workers (Google used Appen until March 19, having terminated their contract earlier in the year).

That said, this work is an essential source of income for many around the world. The jobs are relatively flexible as to location and time, and the workers take pride in their output.

As AI continues to grow, there has been more focus on improving working conditions. For example, the US has launched investigations into some of the large tech companies in response to concerns about how ghost workers are treated. And while some AI experts predict that AI will eventually be able to do this work itself, many others believe that AI will continue to depend on ghost workers for a long time to come.

And considering how much profit is at stake, I’m thinking that maybe they should paid more than $2/hour.

Footnote:

Did I use AI to write this? Kind of? I used Google’s NotebookLM tool to review my sources and create notes. In addition to the sources above, check out: