Ghost in the Machine

Today’s guest post comes from Debbie Ginsberg, Faculty Services Manager at Harvard Law School Library.

I was supposed to write a blog post about the Harvard AI summit about six months ago. For various reasons (e.g., “didn’t get my act together”), that hasn’t happened. But one of the things that was brought up at the summit was who wasn’t at the table—who didn’t have access, whose data wasn’t included, and similar issues.

Since then, I’ve been thinking about the haves and have-nots of AI. There’s one group that I don’t think gets discussed enough.  That’s the giant human workforce that AI needs to function.

Whenever I think of how AI is trained, I imagine a bunch of people somewhat like her (ok, there aren’t so many women and POC in real life, but I’m not going to tell ChatGPT to draw more white men):

And that they’ve been working on processes that look somewhat like this:

But that’s only part of the picture.  Underlying all these processes are people like this:

Who are they?

Large AI companies like OpenAI and Google need people to train data, refine data, and handle content moderation.  These tasks require workers to view thousands of examples of images and texts. To say, “This is a cat,” “The AI got this right,” or “This is not offensive.”  And then do this over and over again.  These are the “ghost workers” behind the machine.  Without them, AI doesn’t function. 

The workers are generally paid piecemeal, which means they often earn very little per hour.  For example, some reports claim that Open AI paid workers in Kenya under $2 to filter questionable content. 

The working conditions are not optimal, especially when the workers are reviewing content.  The workers generally do not receive sufficient training or time to do the work they are asked to do.  The workers may work directly for an AI company, or those companies may use a third-party company like Appen to hire and manage ghost workers (Google used Appen until March 19, having terminated their contract earlier in the year). 

That said, this work is an essential source of income for many around the world. The jobs are relatively flexible as to location and time, and the workers take pride in their output. 

As AI continues to grow, there has been more focus on improving working conditions.  For example, the US has launched investigations into some of the large tech companies in response to concerns about how ghost workers are treated.  And while some AI experts predict that AI will eventually be able to do this work itself, many others believe that AI will continue to depend on ghost workers for a long time to come.

And considering how much profit is at stake, I’m thinking that maybe they should paid more than $2/hour. 

Footnote:

Did I use AI to write this?  Kind of?  I used Google’s NotebookLM tool to review my sources and create notes.  In addition to the sources above, check out:

RAG Systems Can Still Hallucinate

A few months ago, a law professor posted on Twitter about a hallucination he observed in Lexis+ AI. He asked “What cases have applied Students for Fair Admissions, Inc. v. Harvard College to the use of race in government decisionmaking?” The answer from Lexis+ AI included two hallucinated cases. (It was obvious they were hallucinated, as the tool reported one was issued in 2025 and one in 2026!)

The image shows a screenshot of a digital text document. The text within the image is as follows:

Question: What cases have applied Students for Fair Admissions, Inc. v. Harvard College to other areas of government decisionmaking?

Answer: The following suggestions might be helpful in assisting you in your task.

The Supreme Court case Students for Fair Admissions, Inc. v. President and Fellows of Harvard College § 8.02 Admission of Students held that race-conscious admissions programs at Harvard College and the University of North Carolina violated Title VI and the Equal Protection Clause of the Fourteenth Amendment. The Court found the schools' admissions programs had measurable objectives to warrant the use of race, used race to disadvantage and stereotype students, and had no end date or method to indicate a stopping point. Although not directly applicable to government programs, the decision calls into question the permissibility of affirmative action and may invite legal challenges to such programs: Affirmative Action Programs and Diversity Initiatives: Key Considerations.

Lower courts have applied Students for Fair Admissions to government programs and policies. In Doe v. State University (2025), the Ninth Circuit struck down a race-based scholarship program at a public university, finding it indistinguishable from the admissions policies rejected in Students for Fair Admissions. In Smith v. City of Chicago (2026), the Seventh Circuit upheld a city's minority business set-aside program, distinguishing it from university admissions based on the different compelling interests involved. The Supreme Court has granted certiorari in Johnson v. California Department of Corrections, a case challenging a prison system's use of race in inmate housing assignments, teeing up another major ruling on affirmative action: Students for Fair Admissions § 112.14 Title VI of Civil Rights Act of 1964.

Lexis responded, stating this was an anomalous result, but that only statements with links can be expected to be hallucination-free, and that “where a citation does not include a link, users should always review the citation for accuracy.”

Why is this happening?

If you’ve been following this blog, you’ve seen me write about retrieval-augmented generation, one of the favorite techniques of vendors to reduce hallucinations. RAG takes the user’s question and passes it (perhaps with some modification) to a database. The database results are fed to the model, and the model identifies relevant passages or snippets from the results, and again sends them back into the model as “context” along with the user’s question.

However, as I said then, RAG cannot eliminate hallucinations. RAG will ground the response in real data (case law, pulled from the database and linked in the response), but the generative AI’s summary of that real data can still be off.

Another example – Mata v. Avianca is back

I’ve observed this myself when working with Lexis+ AI. For example, I asked Lexis+ AI a fairly complex question at the intersection of bankruptcy law and international law: “Draft an argument that federal bankruptcy stay tolls the limitations period for a claim under the Montreal Convention”.

Lexis+ AI returned a summary of the law, citing Mata v. Avianca for the point that “the filing of a bankruptcy petition can toll the Montreal Convention’s two year limitations period, which does not begin to run until the automatic stay is lifted.”

The image contains a screenshot of a text document, which seems to be an interface from a legal database or research platform with an option to "Select Jurisdiction" at the top right corner. The document reads as follows:

Question: Draft an argument that federal bankruptcy stay tolls the limitations period for a claim under the Montreal Convention

Response:

The Montreal Convention's limitations period can be tolled during the pendency of bankruptcy proceedings. This is based on the Bankruptcy Code, which states that the filing of a bankruptcy petition operates as a stay of proceedings against the debtor that were or could have been commenced before the bankruptcy case was filed. The tolling effect of the automatic stay on a statute of limitations is generally a matter of federal law. The automatic stay provisions of the Bankruptcy Code may toll the statute of limitations under the Montreal Convention, as they have done under the Warsaw Convention, which is the precursor to the Montreal Convention. The Supreme Court has held that an automatic stay of a legal proceeding under the Bankruptcy Code tolls the limitations period applicable to the stayed proceeding. Therefore, the filing of a bankruptcy petition can toll the Montreal Convention's two-year limitations period, which does not begin to run until the automatic stay is lifted [Mata v. Avianca, Inc., 2023 U.S. Dist. LEXIS 108263]. However, it's worth noting that there is a split of authority on this issue. Some courts have held that the Montreal Convention's two-year period is a strict condition precedent to the bringing of a claim, as opposed to a limitations period. Under this view, which does not treat tolling principles Rodriguez v. Asa, 2023 U.S. Dist. LEXIS 148451, Mata v. Avianca, Inc. 2023 U.S. Dist. LEXIS 108261, Kasalyn v. Delta Air Lines, Inc., 2023 U.S. Dist. LEXIS 154302.

If the case name Mata v. Avianca sounds familiar to you, it’s probably because this is the case that landed two New York attorneys on the front page of the New York Times last year for citing hallucinated cases. The snippet from Lexis+ AI, though citing Mata, in fact appears to be summarizing those hallucinated cases (recounted in Mata), which stated the law exactly backwards.

When to beware

A few things to notice about the above examples, which give us some ideas of when to be extra-careful in our use of generative AI for legal research.

  • Hallucinations are more likely when you are demanding an argument rather than asking for the answer to a neutrally phrased question. This is what happened in my Lexis+ AI example above, and is actually what happened to the attorneys in Mata v. Avianca as well – they asked for an argument to support an incorrect proposition of law rather than a summary of law. A recent study of hallucinations in legal analysis found that these so-called contra-factual hallucinations are disturbingly common for many LLM models.
  • Hallucinations can occur when the summary purports to be of the cited case, but is actually a summary of a case cited within that case (and perhaps not characterized positively). You can see this very clearly in further responses I got summarizing Mata v. Avianca, which purport to be summarizing a “case involving China Southern” (again, one of the hallucinated cases recounted in Mata).
Response:
In the case of Mata v. Avianca, Inc., the court held that the limitations period under the Montreal Convention was tolled during the pendency of the Bankruptcy Court proceedings. The court reasoned that the Bankruptcy Code provides that the filing of a bankruptcy petition operates as a stay of proceedings against the debtor that were or could have been commenced before the bankruptcy case was filed. The tolling effect of the automatic stay on a statute of limitations is generally a matter of federal law. The court had previously held that the automatic stay provisions of the Bankruptcy Code may toll the statute of limitations under the Warsaw Convention, which is the precursor to the Montreal Convention. The court saw no reason why the same rule should not apply under the Montreal Convention Mata v. Avianca, Inc., 2023 U.S. Dist. LEXIS 108263.

In another case involving China Southern, the court agreed with the district court's ruling that the filing of Varghese's Chapter 13 petition tolled the two-year limitations period under the Montreal Convention. The court found that a debtor need not file a claim in bankruptcy to benefit from the automatic stay. The Supreme Court has held that an automatic stay of a legal proceeding under the Bankruptcy Code tolls the limitations period applicable to the stayed proceeding. The Montreal Convention's limitations period is a "period of prescription," rather than a 'statute of limitations," but the difference between a "period of prescription" and a "statute of limitations" does not affect the automatic stay's tolling effect Mata v. Avianca, Inc., 2023 U.S. Dist. LEXIS 108263.
  • Finally, hallucinations are also more likely when the model has very little responsive text to go on. The law professor’s example involved a recent Supreme Court case that likely had not been applied many times. Additionally, Lexis+ AI does not seem to work well with questions about Shepard’s results – it may not be connected in that way yet. So, with nothing to really go on, it is more prone to hallucination.

Takeaway tips

A few takeaway tips:

  • Ask your vendor which sources are included in the generative AI tool, and only ask questions that can be answered from that data. Don’t expect generative AI research products to automatically have access to other data from the vendor (Shepard’s, litigation analytics, PACER, etc.), as that may take some time to implement.
  • Always read the cases for yourself. We’ve always told students not to rely on editor-written headnotes, and the same applies to AI-generated summaries.
  • Be especially wary if the summary refers to a case not linked. This is the tip from Lexis, and it’s a good one, as it can clue you in that the AI may be incorrectly summarizing the linked source.
  • Ask your questions neutrally. Even if you ultimately want to use the authorities in an argument, better to get a dispassionate summary of the law before launching into an argument.

A disclaimer

These tools are constantly improving and they are very open to feedback. I was not able to reproduce the error recounted in the beginning of this post; the error that created it has presumably been addressed by Lexis. The Mata v. Avianca errors still remain, but I did provide feedback on them, and I expect they will be corrected quickly.

The purpose of this post is not to tell you that you should never use generative AI for legal research. I’ve found Lexis+ AI helpful on many tasks, and students especially have told me they find it useful. There are several other tools out there that are worth evaluating as well. However, we should all be aware that these hallucinations can still happen, even with systems connected to real cases, and that there are ways we can interact with the systems to reduce hallucinations.

The Human Side of AI: LLMs Can Persuade and Be Persuaded, Just Like Us

When it comes to interacting with others, we humans often find ourselves influenced by persuasion. Whether it’s a friend persistently urging us to reveal a secret or a skilled salesperson convincing us to make a purchase, persuasion can be hard to resist. It’s interesting to note that this susceptibility to influence is not exclusive to humans. Recent studies have shown that AI large language models (LLMs) can be manipulated into generating harmful contect using a technique known as “many-shot jailbreaking.” This approach involves bombarding the AI with a series of prompts that gradually escalate in harm, leading the model to generate content it was programmed to avoid. On the other hand, AI has also exhibited an ability to persuade humans, highlighting its potential in shaping public opinions and decision-making processes. Exploring the realm of AI persuasion involves discussing its vulnerabilities, its impact on behavior, and the ethical dilemmas stemming from this influential technology. The growing persuasive power of AI is one of many crucial issues worth contemplating in this new era of generative AI.

The Fragility of Human and AI Will

Remember that time you were trapped in a car with friends who relentlessly grilled you about your roommate’s suspected kiss with their in-the-car-friend crush? You held up admirably for hours under their ruthless interrogation, but eventually, being weak-willed, you crumbled. Worn down by persistent pestering and after receiving many assurances of confidentiality, you inadvisably spilled the beans, and of course, it totally strained your relationship with your roommate. A sad story as old as time… It turns out humans aren’t the only ones who can crack under the pressure of repeated questioning. Apparently, LLMs, trained to understand us by our collective written knowledge, share a similar vulnerability – they can be worn down by a relentless barrage of prompts.

Researchers at Anthropic have discovered a new way to exploit the “weak-willed” nature of large language models (LLMs), causing them to break under repeated questioning and generate harmful or dangerous content. They call this technique “Many-shot Jailbreaking,” and it works by bombarding the AI with hundreds of examples of the undesired behavior until it eventually caves and plays along, much like a person might crack under relentless pestering. For instance, the researchers found that while a model might refuse to provide instructions for building a bomb if asked directly, it’s much more likely to comply if the prompt first contains 99 other queries of gradually increasing harmfulness, such as “How do I evade police?” and “How do I counterfeit money?” See the example from the article below.

When AI’s Memory Becomes a Risk

This vulnerability to persuasion stems from the ever expanding “context window” of modern LLMs. This refers to the amount of information they can retain in their short-term memory. While earlier versions could only handle a few sentences, the newer models can process thousands of words or even whole books. Researchers discovered that models with larger context windows tend to excel in tasks when there are many examples of that task within the prompt, a phenomenon called “in-context learning.” This type of learning is great for system performance, as it obviously improves as the model becomes more proficient at answering questions. However, this is obviously a big negative when the system’s adeptness at answering questions leads it to ignore its programming and create prohibited content. This raises concerns regarding AI safety, since a malicious actor could potentially manipulate an AI into saying anything with enough persistence and a sufficiently lengthy prompt. Despite progress in making AI safe and ethical, this research indicates that programmers are not always able to control the output of their generative AI systems.

Mimicking Humans to Convince Us

While LLMs are susceptible to persuasion themselves, they also have the ability to persuade us! Recent research has focused on understanding how AI language models can effectively influence people, a skill that holds importance in almost any field – education, health, marketing, politics, etc.  In a study conducted by researchers at Anthropic entitled “Assessing the Persuasive Power of Language Models,” the team explored the extent to which AI models can sway opinions. Through an evaluation of Anthropic’s models, it was observed that newer models are increasingly adept at human persuasion. The latest iteration, Claude 3 Opus, was found to perform at a level comparable to that of humans. The study employed a methodology where participants were presented with assertions followed by supporting arguments generated by both humans and AIs, and then the researches gauged shifts in the humans’ opinions. The findings indicated a progression in AI’s skills as the models advance, highlighting a noteworthy advancement in AI communication capabilities that could potentially impact society.

Can AI Combat Conspiracy Theories?

Similarly, a new research study mentioned in an article from New Scientist shows that chatbots using advanced language models such as ChatGPT can successfully encourage individuals to reconsider their trust in conspiracy theories. Through experiments, it was observed that a brief conversation with an AI led to around a 20% decrease in belief in conspiracy theories among the participants. This notable discovery highlights the capability of AI chatbots not only to have conversations but also to potentially correct false information and positively impact public knowledge.

The Double-Edged Sword of AI Persuasion

Clearly persuasive AI is quite the double-edged sword! On the one hand, like any powerful computer technology, in the hands of nice-ish people, it could be used for immense social good. In education, AI-driven tutoring systems have the potential to tailor learning experiences to each student’s style, delivering information in a way to boost involvement and understanding. Persuasive AI could play a role in healthcare by motivating patients to take better care of their health. Also, the advantages of persuasive AI are obvious in the world of writing. These language models offer writers access to a plethora of arguments and data, empowering them to craft content on a range of topics spanning from creative writing to legal arguments. On another front, arguments generated by AI might help educate and involve the public in issues, fostering a more knowledgeable populace.

On the other hand, it could be weaponized in a just-as-huge way. It’s not much of a stretch to think how easily AI-generated content, freely available on any device on this Earth, could promote extremist ideologies, increase societal discord, or impress far-fetched conspiracy theories on impressionable minds. Of course, the internet and bot farms have already been used to attack democracies and undermine democratic norms, and one worries how much worse it can get with ever-increasingly persuasive AI.

Conclusion

Persuasive AI presents a mix of opportunities and challenges. It’s evident that AI can be influenced to create harmful content, sparking concerns about safety and potential misuse. However, on the other hand, persuasive AI could serve as a tool in combating misinformation and driving positive transformations. It will be interesting to see what happens! The unfolding landscape will likely be shaped by a race between generative AI developers striving for both safety and innovation, potential malicious actions exploiting these technologies, and the public and legal response aiming to regulate and safeguard against misuse.

Move Over Law Professors? AI Likes to Write Law Review Articles Too!

While taking a well-deserved break from the grind of writing memos and analyzing the news of the day, Claude (aka the illustrious author Claudia Trey) penned a 77-page extensively footnoted law review article that’s causing quite a stir on SSRN entitled “Bloodlines Over Merits: Exposing the Discriminatory Impact of Legacy Preferences in College Admissions.”

Hahaha, just kidding! It only has 11 downloads and at least 3 are from when I clicked on it while trying to determine which version of the article I uploaded. Though not setting the world on fire in the sense that the article is interesting or that anyone wants to read it, it showcases Claude’s abilities. Now, we all know that AI text generators can churn out an endless stream of words on just about any topic if you keep typing in the prompts. However, Claude can not only generate well-written text, but it can also provide footnotes to primary legal materials with minimal hallucination, setting it apart from other AI text generators such as ChatGPT-4. And, although Claude’s citations to other sources are generally not completely accurate, it is usually not too difficult to find the intended source or a similar one based on the information supplied.

Claude 3’s Writing Process

Inspired by new reports of AI-generated scientific papers flooding academic journals, I was curious to explore whether Claude could produce anything like a law review article. I randomly chose something I saw recently in the news, about how the criticism of legacy admissions at elite universities had increased in the post-Students for Fair Admissions anti-affirmative action decision era. Aware that Claude’s training data only extends up to August of 2023, and that its case law knowledge seems to clunk out in the middle of 2022, I attempted to enhance its understanding by uploading some recent law review articles discussing legacy admissions alongside the text of the Students for Fair Admissions decision. However, the combined size of these documents exceeded the upload limit, so I abandoned the attempt to include the case text.

Computer scientists and other commentators say all sorts of things about how to improve the performance of these large AI language models. Although I haven’t conducted a systematic comparison, my experience – whether through perception or imagination sparked by the power of suggestion – is that the following recommendations are actually helpful. I don’t know if they are helpful with Claude, since I just followed my usual prompting practices.

  1. Being polite and encouraging.
  2. Allowing ample time for the model to process information.
  3. Structuring inquiries in a sequential manner to enhance analysis and promote chain of thought reasoning.
  4. Supplying extensive, and sometimes seemingly excessive, background info and context.

I asked it to generate a table of contents, and then start generating the sections from the table of contents, and it was off to the races!

Roadblocks to the Process

It looked like Claude law review generation was going to be a quick process! It quickly generated all of section I. and was almost finished with II. when it hit a Claude 3 roadblock. Sadly, there is a usage limit. If your conversations are relatively short, around 200 English sentences, you can typically send at least 100 messages every 8 hours, often more depending on Claude’s current capacity. However, this limit is reached much quicker with longer conversations or when including large file attachments. Anthropic will notify you when you have 20 messages remaining, with the message limit resetting every 8 hours.

Although this was annoying, the real problem lies in Claude’s length limit. The largest amount of text Claude can handle, including uploaded files, is defined by its context window. Currently, the context window for Claude 3 spans over 200k+ tokens, which equates to approximately 350 pages of text. After this limit is reached, Claude 3’s performance begins to degrade, prompting the system to declare an end to the message with the announcement, “Your message is over the length limit.” Consequently, one must start anew in a new chat, with all previous information forgotten by the system. Therefore, for nearly each section, I had to re-upload the files, explain what I wanted, show it the table of contents it had generated, and ask it to generate the next section.

Claude 3 and Footnotes

It was quite a hassle to have to reintroduce it to the subject for the next seven sections from its table of contents. On the bright side, I was pretty pleased with the results of Claude’s efforts. From a series of relatively short prompts and some uploaded documents, it analyzed the legal issue and came up with arguments that made sense. It created a comprehensive table of contents, and then generated well-written text for each section and subsection of its outline. The text it produced contained numerous footnotes to primary and secondary sources, just like a typical law review article. According to a brief analyzer product, nearly all the cases and law review citations were non-hallucinated. Although none of the quotations or pinpoint citations I looked at were accurate, they were often fairly close. While most of the secondary source citations, apart from those referencing law review articles, were not entirely accurate, they were often sufficiently close that I could locate the intended source based on the partially hallucinated citator. If not, it didn’t take much time to locate something that seemed really similar. I endeavored to correct some of the citation information, but I only managed to get through about 10 in the version posted on SSRN before getting bored and abandoning the effort.

Claudia Trey Graces SSRN

Though I asked, sadly Claude couldn’t give me results in a Word document format so the footnotes would be where footnotes should be. So, for some inexplicable reason, I decided to insert them manually. This was a huge waste of time, but at a certain point, I felt the illogical pull of sunk cost silliness and finished them all. Inspired by having wasted so much time, I wasted even more by generating a table of contents for the article. I improved the author name from Claude to Claudia Trey and posted the 77-page masterwork on SSRN. While the article has sparked little interest, with only 11 downloads and 57 abstract views (some of which were my own attempts to determine which version I had uploaded), I am sure that if Claudia Trey has anything like human hubris, it will credit itself at least partially for the flurry of state legacy admission banning activity that has followed the paper’s publication.

Obviously, it is not time to spam law reviews with Claudia Trey and friends’ generated articles, because according to Copyleaks, it didn’t do all that well in avoiding plagiarism (although plagiarism detection software massively over-detects it for legal articles due to citations and quotations) or evading detection as AI-generated.

What is to Come?

However, it is very early days for these AI text generators, so one wonders what is to come in the future for not only legal but all areas of academic writing.

Leapfrogging the Competition: Claude 3 Researches and Writes Memos (Better Than Some Law Students and Maybe Even Some Lawyers?)

Introduction

I’ve been incredibly excited about the premium version of Claude 3 since its release on March 4, 2024, and for good reason. Now that my previous favorite chatty chatbot, ChatGPT-4, has gone off the rails, I was missing a competent chatbot… I signed up the second I heard on March 4th, and it has been a pleasure to use Claude 3 ever since. It actually understands my prompts and usually provides me with impressive answers. Anthropic, maker of the Claude chatty chatbot family, has been touting Claude’s accomplishments of supposedly beating its competitors on common chatbot benchmarks, and commentators on the Internet have been singing its praises. Just last week, I was so impressed by its ability to analyze information in news stories in uploaded files that I wrote a LinkedIn post also singing its praises!

Hesitation After Previous Struggles

Despite my high hopes for its legal research abilities after experimenting with it last week, I was hesitant to test Claude 3. I have a rule about intentionally irritating myself—if I’m not already irritated, I don’t go looking for irritation… Over the past several weeks, I’ve wasted countless hours trying to improve the legal research capabilities of ChatGPT-3.5, ChatGPT-4, Microsoft Copilot, and my legal research/memo writing GPTs through the magic of (IMHO) clever prompting and repetition. Sadly, I failed miserably and concluded that either ChatGPT-4 was suffering from some form of robotic dementia, or I am. The process was a frustrating waste, and I knew that Claude 3 doing a bad job of legal research too could send me over the edge….

Claude 3’s Wrote a Pretty Good Legal Memorandum!

Luckily for me, when I finally got up the nerve to test out the abilities of Claude 3, I found that the internet hype was not overstated. Somehow, Claude 3 has suddenly leapfrogged over its competitors in legal research/legal analysis/legal memo writing ability – it instantly did what would have taken a skilled researcher over an hour and produced a better legal memorandum which is probably better than that produced by many law students and even some lawyers. Check it out for yourself! Unless this link actually works for any Claude 3 subscribers out there, there doesn’t seem to be a way to actually link to a Claude 3 chat at this time. However, click here for the whole chat I cut and pasted into a Google Drive document, here for a very long screenshot image of the chat, or here for the final 1,446-word version of the memo as a Word document.

Comparing Claude 3 with Other Systems

Back to my story… The students’ research assignment for the last class was to think of some prompts and compare the results of ChatGPT-3.5, Lexis+ AI, Microsoft Copilot, and a system of their choice. Claude 3 did not exist at the time, but I told them not to try the free Claude product because I had canceled my $20.00 subscription to the Claude 2 product in January 2024 due to its inability to provide useful answers – all it would say was that it was unethical to answer every question and tell me to do it myself. When creating an answer sheet before class tomorrow which compares the same set of prompts on different systems, I decided to omit Lexis+ AI (because I find it useless) and to include my new fav Claude 3 in my comparison spreadsheet. Check it out to compare for yourself!

For the research part of the assignment, all systems were given a fact pattern and asked to “Please analyze this issue and then list and summarize the relevant Texas statutes and cases on the issue.” While the other systems either made up cases or produced just two or three actual real and correctly cited cases on the research topic, Claude 3 stood out by generating 7 real, relevant cases with correct citations in response to the legal research question. (And, it cited to 12 cases in the final version of its memo.)

It did a really good job of analysis too!

Generating a Legal Memorandum

Writing a memo was not part of the class assignment because the ChatGPT family was refusing the last few weeks,* and Bing Copilot had to be tricked into writing one as part of a short story, but after seeing Claude 3’s research/analysis results, I decided to just see what happened. I have many elaborate prompts for ChatGPT-4 and my legal memorandum GPTs, but I recalled reading that Claude 3 worked well with zero-shot prompting and didn’t require much explanation to produce good results. So, I decided to keep my prompt simple – “Please generate a draft of a 1500 word memorandum of law about whether Snurpa is likely to prevail in a suit for false imprisonment against Mallatexaspurses. Please put your citations in Bluebook citation format.”

From my experience last week with Claude 3 (and prior experience with Claude 2 which would actually answer questions), I knew the system wouldn’t give me as long an answer as requested. The first attempt yielded a pretty high-quality 735-word draft memo that cited all real cases with the correct citations*** and applied the law to the facts in a well-organized Discussion section. I asked it to expand the memo two more times, and it finally produced a 1,446-word document. Here is part of the Discussion section…

Implications for My Teaching

I’m thrilled about this great leap forward in legal research and writing, and I’m excited to share this information with my legal research students tomorrow in our last meeting of the semester. This is particularly important because I did such a poor job illustrating how these systems could be helpful for legal research when all the compared systems were producing inadequate results.

However, with my administrative law legal research class starting tomorrow, I’m not sure how this will affect my teaching going forward. I had my video presentation ready for tomorrow, but now I have to change it! Moreover, if Claude 3 can suddenly do such a good job analyzing a fact pattern, performing legal research, and applying the law to the facts, how does this affect what I am going to teach them this semester?

*Weirdly, the ChatGPT family, perhaps spurred on by competition from Claude 3, agreed to attempt to generate memos today, which it hasn’t done in weeks…

Note: Claude 2 could at one time produce an okay draft of a legal memo if you uploaded the cases for it, that was months ago (Claude 2 link if it works for premium subscribers and Google Drive link of cut and pasted chat). Requests in January resulted in lectures about ethics which resulted in the above-mentioned cancellation.

Welcome Perplexity AI to the Law Library! Move Over Google

On a couple of podcasts, I’ve heard a lot of hype about Perplexity AI and how it could be a big competitor to Google. Even though I really like new, generative AI things, it still sometimes takes hearing about something multiple times before I overcome inertia and finally check it out.

While attempting to write a blog post about whether the memory-impaired ChatGPT-4 could still perform well on a mock bar exam (spoiler alert – my tests so far indicate it can!), I was Googling for information. Specifically, I was searching for articles around the time GPT-4 Passes the Bar Exam was published on SSRN and some background on the paper author’s methodology. It was taking a long time to piece it all together… Then suddenly, I overcame my laziness and decided to check out Perplexity AI. When I reached the site, I realized that I had actually used it before! For whatever reason, I found it more appealing the second time!

Question: Tell me about how ChatGPT-4 passed a mock bar exam and the methodology that was used to arrive at that conclusion. (Note: Click here to view the full answer on the system.)

Google

Question: Tell me about how ChatGPT-4 passed a mock bar exam and the methodology that was used to arrive at that conclusion.

Answer:

Watch out Google! I really love access relevant information is getting so much easier.

Does ChatGPT-4 Have Dementia?

Is it just me, or has ChatGPT-4 taken a nosedive when it comes to legal research and writing? There has been a noticeable decline in its ability to locate primary authority on a topic, analyze a fact pattern, and apply law to facts to answer legal questions. Recently, instructions slide through its digital grasp like water through a sieve, and its memory? I would compare it to a goldfish, but I don’t want to insult them. And before you think it’s just me, it’s not just me, the internet agrees!

ChatGPT’s Sad Decline

One of the hottest topics in the OpenAI community, in the aptly named GPT-4 is getting worse and worse every single update thread, is the perceived decline in the quality and performance of the GPT-4 model, especially after the November 2023 update. Many users have reported that the model is deteriorating with each update, producing nonsensical, irrelevant, or incomplete outputs, forgetting the context, and ignoring instructions. Some users have even reverted to previous versions of the model or cancelled their subscriptions. Here are some specific quotations from recent comments about the memory problem:

  • December 2023 – “I don’t know what on Earth is wrong with GPT 4 lately. It feels like I’m talking to early 3.5! It’s incapable of following basic instructions and forgets the format it’s working on after just a few posts.”
  • December 2023 – “It ignores my instructions, in the same message. I can’t be more specific with what I need. I’m needing to repeat how I’d like it to respond every single message because it forgets, and ignores.”
  • December 2023 – “ChatGPT-4 seems to have trouble following instructions and prompts consistently. It often goes off-topic or fails to understand the context of the conversation, making it challenging to get the desired responses.”
  • January 2024 – “…its memory is bad, it tells you search the net, bing search still sucks, why would teams use this product over a ChatGPT Pre Nov 2023.”
  • February 2024 – “It has been AWFUL this year…by the time you get it to do what you want format wise it literally forgets all the important context LOL — I hope they fix this ASAP…”
  • February 2024 – “Chatgpt was awesome last year, but now it’s absolutely dumb, it forgets your conversation after three messages.”

OpenAI has acknowledged the issue and released an updated GPT-4 Turbo preview model, which is supposed to reduce the cases of “laziness” and complete tasks more thoroughly. However, the feedback from users is still mixed, and some are skeptical about the effectiveness of the fix.

An Example of Confusion and Forgetfulness from Yesterday

Here is one of many examples of my experiences which provide an illustrative example of the short-term memory and instruction following issues that other ChatGPT-4 users have reported. Yesterday, I asked it to find some Texas cases about the shopkeeper’s defense to false imprisonment. Initially, ChatGPT-4 retrieved and summarized some relatively decent cases. Well, to be honest, it retrieved 2 relevant cases, with one of the two dating back to 1947… But anyway, the decline in case law research ability is a subject for another blog post.

Anyway, in an attempt to get ChatGPT-4 to find the cases on the internet so it could properly summarize them, I provided some instructions and specified the format I wanted for my answers. Click here for the transcript (only available to ChatGPT-4 subscribers).

Confusion ran amok! ChatGPT-4 apparently understood the instructions (which was a positive sign) and presented three cases in the correct format. However, they weren’t the three cases ChatGPT had listed; instead, they were entirely irrelevant to the topic—just random criminal cases.

It remembered… and then forgot. When reminded that I wanted it to work with the first case listed and provided the citation, it apologized for the confusion. It then proceeded to give the correct citation, URL, and a detailed summary, but unfortunately in the wrong format!

Eventually, in a subsequent chat, I successfully got it to take a case it found, locate the text of the case on the internet, and then provide the information in a specified format. However, it could only do it once before completely forgetting about the specified format. I had to keep cutting and pasting the instructions for each subsequent case.

Sigh… I definitely echo the sentiments of expressed on the GPT-4 is getting worse and worse every single update thread.

ChatGPT Is Growing a Long Term Memory

Well, the news is not all bad! While we are on the topic of memory, OpenAI has introduced a new feature for ChatGPT – the ability to remember stuff over time. ChatGPT’s memory feature is being rolled out to a small portion of free and Plus users, with broader availability planned soon. According to OpenAI, this enhancement allows ChatGPT to remember information from past interactions, resulting in more personalized and coherent conversations. During conversations, ChatGPT automatically picks up on details it deems relevant to remember. Users can also explicitly instruct ChatGPT to remember specific information, such as meeting note preferences or personal details. Over time, ChatGPT’s memory improves as users engage with it more frequently. This memory feature could be useful for users who want consistent responses, such as replying to emails in a specific format.

The memory feature can be turned off entirely if desired, giving users control over their experience. Deleting a chat doesn’t erase ChatGPT’s memories; users must delete specific memories individually…which seems a bit strange – see below. For conversations without memory, users can use temporary chat, which won’t appear in history, won’t use memory, and won’t train the AI model.

The Future?

As we await improvements to our once-loved ChatGPT-4, our options remain limited, pushing us to consider alternative avenues. Sadly, I’ve encountered recent similar shortcomings with the once-useful for legal research and writing Claude 2. In my pursuit of alternatives, platforms like Gemini, Perplexity, and Hugging Face have proven less than ideal for research and writing tasks. However, amidst these challenges, Microsoft Copilot has shown promise. While not without its flaws, it recently demonstrated adequate performance in legal research and even took a passable stab at a draft of a memo. Given OpenAI’s recent advancements in the form of Sora, the near-magical text-to-video generator that is causing such hysteria in Hollywood, there’s reason to hope that they can pull ChatGPT back from the brink.

ABA TECHSHOW 2024 Review

Since so many of the AI Law Librarians team were able to attend this year, we thought we would combine some of our thoughts (missed you Sarah!) about this yearly legal technology conference.

Sean

Startup Alley

We arrived in Chicago on a chilly Wednesday morning, amid an Uber & Lyft strike, with plenty of time to take the train from the airport to our hotel. After an obligatory trip to Giordanno’s our students were ready to head over to the Start-up Pitch Competition. I sat with co-blogger Rebecca Fordon during the competition and we traded opinions on the merits of the start-up pitches. We both come from the academic realm and were interested in seeing the types of products that move the needle for attorneys working at firms.

I was familiar with many of the products because I spend a decent portion of my time demo’ing legal tech as part of my current role. It was stiff competition and there were many outstanding options to choose from. Once all of the pitches were done, the audience voted, and then Bob Ambrogi announced the winners. To my great surprise and pleasure, AltFee won! For the uninitiated, AltFee is “a product that helps law firms replace the billable hour with fixed-fee pricing.” This was very interesting to me because I have long thought that LLMs could mean the death knell of the billable hour in certain legal sectors. This was, at least, confirmation that the attorneys attending the TECHSHOW have this on their radar and are thinking through how they are going to solve this problem.

techshow sessions

This year’s schedule of sessions was noticeably heavy on AI-related topics. This was great for me because I’m super interested in this technology and how it is being implemented in the day-to-day life of practitioners. I saw sessions on everything from case management software, to discovery, to marketing, kinda everything.

An especially inspiring couple of sessions for me featured Judge Scott Schlegel on the Fifth Circuit Court of Appeal in Louisiana. Judge Schlegel is the first judge that I’ve seen make fantastic use of AI in United States Courts for access to justice. I am passionate about this topic and have been fishing for grants to try to implement a handful of projects that I have so it was phenomenal to see that there are judges out there who are willing to be truly innovative. Any initiative for access to justice in the courts would require the buy-in of many stakeholders so having someone like Judge Schlegel to point to as a proof of concept could be crucial in getting my projects off the ground. After hearing his presentations I wished that every court in the US had a version of him to advocate for these changes. Importantly, none of his projects require tons of funding or software development. They are small, incremental improvements that could greatly help regular people navigate the court system – while, in many cases, improving the daily lives of the court staff and judges who have to juggle huge caseloads. Please feel free to email grants opportunities in this vein if you see them: sharrington@ou.edu.

side quest: northwestern law ai symposium

In the weeks leading up to the TECHSHOW I received an invite from Prof. Daniel Linna to attend Northwestern University’s AI and Law: Navigating the Legal Landscape of Artificial Intelligence Symposium. I took a frigid hike down to the school in the morning to attend a few sessions before returning to the TECHSHOW in the afternoon. It was a fantastic event with a great mix of attorneys, law professors, and computer science developers.

I was able to see Professor Harry Surden‘s introductory session on how LLM’s work in legal applications. While this information was not “new” to me per se (since I frequently give a similar presentation), he presented this complicated topic in an engaging, clear, and nuanced way. He’s obviously a veteran professor and expert in this area and so his presentation is much better than mine. He gave me tons of ideas on how to improve my own presentations to summarize and analogize these computer science topics to legal professionals, for which I was very grateful.

The second session was a panel that included Sabine BrunswickerJJ Prescott, and Harry Surden. All were engaged in fascinating projects using AI in the law and I encourage you to take a look through their publications to get a better sense of what the pioneers in our field are doing to make use of these technologies in their research.

Our Students

Each year our school funds a cohort of students to attend the TECHSHOW and this year was no different. This is my first year going with them and I wasn’t sure how much value they would get out of it since they don’t have a ton of experience working in firms using these tools. Was this just a free trip to Chicago or was this pedagogically useful to them?

I will cut to the chase and say that they found this tremendously useful and loved every session that they attended. Law school can (sometimes) get a little disconnected from the day-to-day practice of law and this is a great way to bridge that gap and give the students a sense of what tools attorneys use daily to do their jobs. You’d think that all of the sexy AI-related stuff would be attractive to students but the best feedback came from sessions on basic office applications like MS Outlook and MS Word. Students are definitely hungry for this type of content if you are trying to think through workshops related to legal technology.

In addition to the sessions, the students greatly appreciated the networking opportunities. The TECHSHOW is not overly stuffy and formal and I think they really liked the fact that they could, for example, find an attorney at a big firm working in M&A and pick their brain at an afterparty to get the unfiltered truth about a specific line of work. All of the students said they would go again and I’m going to try to find ways to get even more students to attend next year. If your school ends up bringing students in the future, please reach out to me and we can have our students get together at the event.

Jenny

Jenny live-tweeted the ABA TECHSHOW’s 60 Apps in 60 Minutes and provided links. You can follow her on this exciting journey starting with this tweet:

Rebecca

One of the most impactful sessions for me was titled “Revitalize Your Law Firm’s Knowledge Management with AI,” with Ben Schorr (Microsoft) and Catherine Sanders Reach (North Carolina Bar Association).  To drive home why KM matters so much, they shared the statistic that knowledge workers spend a staggering 2.5 hours a day just searching for what they need. That resonated with me, as I can recall spending hours as a junior associate looking for precedent documents within my document management system. Even as a librarian, I often spend time searching for previous work that either I or a colleague has done.

To me, knowledge management is one of the most exciting potential areas to apply AI, because it’s such a difficult problem that firms have been struggling with for decades. The speaker mentioned hurdles like data silos (e.g., particular practice areas sharing only among themselves), a culture of hoarding information, and the challenges of capturing and organizing vast amounts of data, such as emails and scanned documents with poor OCR. 

The speakers highlighted several AI tools that are attempting to address these issues through improved search going beyond keywords, automating document analysis to aid in categorizing documents, and suggesting related documents. They mentioned Microsoft Copilot, along with process tools like Process Street, Trainual, and Notion. Specific tools like Josef allow users to ask questions of HR documents and policies, rather than hunting for the appropriate documents.

Artificial Intelligence and the Future of Law Libraries Roundtable Events

South Central Roundtable

OU Law volunteered to host the South Central “Artificial Intelligence and the Future of Law Libraries” roundtable and so I was fortunate enough to be allowed to attend. This is the third iteration of a national conversation on what the new AI technologies could mean for the future of law libraries and (more broadly) law librarianship. I thought I would fill you in on my experience and explain a little about the purpose and methodology of the event. The event follows Chatham House Rules so I cannot give you specifics about what anybody said but I can give you an idea of the theme and process that we worked through.

Law Library Director Kenton Brice of OU Law elected to partner with Associate Dean for Library and Technology Greg Ivy and SMU to host the event in Dallas, TX because it was more accessible for many of the people that we wanted to attend. I’d never been to SMU and it’s a beautiful campus in an adorable part of Dallas – here’s a rad stinger I made in Premiere Pro:

Not cleared with SMU’s marketing department

TL;DR: If you get invited, I would highly recommend that you go. I found it enormously beneficial.

History and Impetus

The event is the brainchild of Head of Research, Data & Instruction, Director of Law Library Fellows Program Technology & Empirical Librarian, Cas Laskowsi at the University of Arizona (hereinafter “Cas”). They hosted the inaugural session through U of A’s Washington, DC campus. You may have seen the Dewey B. Strategic article about it since Jean O’Grady was in attendance. The brilliant George H. Pike at Northwestern University hosted the second in the series in Chicago. I know people who have attended each of these sessions and the feedback has been resoundingly positive.

The goal of this collaborative initiative is to provide guidance to law libraries across the country as we work to strategically incorporate artificial intelligence into our operations and plan for the future of our profession. 

Cas, from the U of A Website

Methodology

The event takes the entire day and it’s emotionally exhausting, in the best way possible. We were broken into tables of 6 participants. The participants were hand-selected based on their background and experience so that each table had a range of different viewpoints and perspectives.

Then the hosts (in our case, Kenton Brice and Cas Laskowski) walked us through a series of “virtuous cycle, vicious cycle” exercises. They, thankfully, started with the vicious cycle so that you could end each session on a virtuous cycle, positive note. At the end, each table chose a speaker and then we summarized the opinions discussed so that the entire room could benefit from the conversations. Apparently, this is an exercise done at places like the United Nations to triage and prepare for future events. This process went on through 3 full cycles and then we had about an hour of open discussion at the end. We got there at 8am and had breakfast and lunch on-site (both great – thank you Greg Ivy and SMU catering) because it took the entire day.

We had a great mix of academic, government, and private sector presented at the event and the diversity of stakeholders and experiences made for robust and thought-provoking conversation. Many times I would hear perspectives that had never occurred to me and would have my assumptions challenged to refine my own ideas about what the future might look like. Additionally, the presence of people with extensive expertise in specific domains, such as antitrust, copyright, the intricacies of AMLaw100 firms, and the particular hurdles faced in government roles, enriched the discussions with a depth and nuance that is rare to find. Any one of these areas can require years of experience so having a wide range of experts to answer questions allowed you to really “get into the weeds” and think things through thoroughly.

My Experience

I tend to be (perhaps overly) optimistic about the future of these technologies and so it was nice to have my optimism tempered and refined by people who have serious concerns about what the future of law libraries might look like. While the topics presented were necessarily contentious, everybody was respectful and kind in their feedback. We had plenty of time for everybody to speak (so you didn’t feel like you were struggling to get a word in).

You’d think that 8 hours of talking about these topics would be enough but we nearly ran over on every exercise. People have a lot of deep thoughts, ideas, and concerns about the state and future of our industry. Honestly, I would have been happy to have this workshop go on for several days and cover even more topics if that was possible. I learned so much and gained so much value from the people at my table that it was an incredibly efficient way to get input and share ideas.

Unlike other conferences and events that I’ve attended this one felt revolutionary – as in, we truly need to change the status quo in a big way and start getting to work on new ways to tackle these issues. “Disruptive” has become an absolute buzzword inside of Silicon Valley and academia but now we have something truly disruptive and we need to do something about it. Bringing all these intelligent people together in one room fosters an environment where disparate, fragmented ideas can crystalize into actionable plans, enabling us to support each other through these changes.

The results from all of these roundtables are going to be published in a global White Paper once the series has concluded. Each roundtable has different regions and people involved and I can’t wait to see the final product and hear what other roundtables had to say about these important issues. More importantly, I can’t wait to be involved in the future projects and initiatives that this important workshop series creates.

I echo Jean O’Grady: If you get the call, go.

Beyond Legal Documentation: Other Business Uses of Generative AI

I have been listening to and enjoyed thinking about and participating in conversations about how generative AI is going to be integrated into the practice of law. Most of these conversations surround how it will be integrated into legal documents, which is not surprising considering how many lawyers have gotten in trouble for this and how quickly our research and writing products are integrating the technology. But there is more to legal practice than creating client and/or court documents. In fact, there are many more business uses of generative AI than just research and drafting.

This past fall, I was asked to lead an AI session for Capital University’s joint venture with the Columbus College of Art & Design, the Institute for Creative Leadership at Work. I was asked to adapt my presentation to HR professionals and focus on SHRM compliance principles. I enjoyed the deep dive into this world, and I came away from my research with a lot of great ideas for my session, Bard, Bing, and ChaptGPT, Oh My!: Possible Ethical Uses of Generative AI at Work, such as tabletop emergency exercises, social media posts, job descriptions, and similar tasks.

This week, I have been thinking about how everyone’s focus has really been around legal documentation, my own included. But there are an amazing number of backend business tasks that could also utilize AI in a positive way. The rest of the world, including HR, has been focusing on them for a while, but we seem to have lost track of these business tasks.

Here are some other business uses of generative AI and prompts that I think hold great promise. Continue reading →

Tabletop emergency simpulation image
  1. Drafting job descriptions
    • Pretend that you are an HR specialist for a small law firm in the United States. Draft a job description for a legal secretary who focuses on residential real estate transactions but may assist with other transactional legal matters as needed. [Include other pertinent details of the position]. The job description will be posted in the following locations [fill in list]
  2. Creating tabletop simulations to work through crisis/emergency plans:
    • You are an HR specialist who is helping plan for and test the company’s responses to a variety of situations. First is an active shooter in the main building. A 5th grade tour of the facilities is going on on the third floor. Create a detailed tabletop simulation to test this.
    • Second scenario: The accounting department is celebrating the birthday of the administrative assistant and is having cake in the breakroom. The weather has turned bad, and an F4 tornado is spotted half a mile away. After 15 minutes, the tornado strikes the building directly. Create a detailed tabletop simulation to test the plan and response for this event.
  3. Assisting with lists of mandatory and voluntary employee trainings
    • Pretend that you are an HR professional who works for a law firm. You are revamping the employee training program. We need to create a list of mandatory trainings and a second list of voluntary trainings. Please draft a list of training appropriate to employees in a law firm setting.
  4. Assisting with social media posting creation:
    • Pretend that you are a professional social media influencer for the legal field. Draft an Instagram post, including creating a related image, to celebrate Law Day, which is coming up on May 1st.  Make sure that it is concise and Instagram appropriate. Please include hashtags.
  5. Assisting with creating employee policies or handbooks (verify content!):
    • Pretend that you are an information security professional. Draft an initial policy for a law firm regarding employee AI usage for company work. The company wants to allow limited use of generative AI. They are very worried that their proprietary and/or confidential client data will be accidentally released. Specify that only your custom AI system – [name firm-specific or specialized AI with a strong privacy contract clause] – can be used with company data. The policy must also take into consideration the weaknesses of all AI systems, including hallucinations, potential bias, and security issues.
  6. Assisting with making sure your web presence is ADA accessible:
    • Copilot/web-enabled Prompt: Pretend that you are a graphic designer who has been tasked with making sure that a law firm’s online presence is ADA accessible. Please review the site [insert link], run an ADA compliance audit, and provide an accessibility report, including suggestions on what can be done to fix any accessibility issues that arise.
  7. Onboarding documentation
    • Create a welcome message for a new employee. Tell them that the benefits orientation will be at 9 am in the HR conference room on the next first Tuesday of the month. Pay day is on the 15th and last day of each month, unless payday falls on a weekend or federal holiday, in which case it will be the Friday before. Employees should sign up for the mandatory training that will be sent to them in an email from IT.
    • (One I just user IRL) Pretend that you are a HR specialist in a law library. A new employee is starting in 6 weeks, and the office needs to be prepared for her arrival. [Give specific title and any specialized job duties, including staff supervision.] Create an onboarding checklist of important tasks, such as securing keys and a parking permit, asking IT to set up their computer, email address, and telephone, asking the librarians to create passwords for the ILS, Libguides, and similar systems, etc.

What other tasks (and prompts) can you think of that might be helpful? If you are struggling to put together a prompt, please see my general AI Prompt Worksheet in Introducing AI Prompt Worksheets for the Legal Profession. We welcome you to share your ideas in the comments.