I’m sharing a guide and exercise I’ve developed for my legal research courses. This Google spreadsheet provides instructions on crafting AI prompts for legal research and includes a practical exercise for comparing different AI systems. It’s designed to help develop skills in leveraging AI for legal research. Feel free to copy it to adapt it to your own purposes. (Note: The images were blurry unless I sort of chopped them off, so sorry about that!)
The spreadsheet consists of three different parts:
Prompt Formulation Guide: This section breaks down the anatomy of an effective legal research prompt. It introduces the RICE framework:
Sample Prompts: The spreadsheet includes several examples of prompts for various legal research scenarios which can serve as templates.
AI System Comparison Exercises: These sections provide a framework for students to test their prompts across different AI systems like Lexis, ChatGPT, and Claude, allowing for a comparative analysis of their effectiveness.
Feel free to copy it to adapt it to your own purposes, and let me know if you have any suggestions for improvements!
(Oh, and by the way, be sure to register now to see Rebecca Rich and Jennifer Wondracek’s AI and Neurodiverse Students AALS Technology Law section presentation tomorrow, Wednesday, July 10, 2024, 2 p.m. eastern time!)
AI Tools for Scholarly Research
Anway, our presentation focused on the potential of AI in scholarly research, various AI tools with academic uses, and specific use cases for generative AI in legal scholarship. We discussed AI scholarly research tools that connect to databases, use semantic search, and construct answers using generative AI. We also touched upon specialty AI research tools, citation mapping AI, and law-specific scholarly research AI.
It’s important to note that many of the specialty AI systems, such as Consensus, Litmaps, and Elicit, currently have limited coverage of legal literature, particularly law review articles. As a result, these tools may be more useful for legal scholars conducting interdisciplinary research that draws upon sources from other fields. However, we are hopeful that these systems will expand their databases to include more legal literature in the future, making them even more valuable for legal scholarship.
Specific AI Systems for Interdisciplinary Researchers
During the presentation, we delved into several specific AI systems that can be particularly useful for interdisciplinary reseachers:
Consensus ($9/mo, with a more limited free version): A tool that connects to databases of academic research and uses generative AI to construct answers to queries.
Litmaps ($10/mo, with a limited free version to test): A citation mapping AI that allows users to select or upload papers and find related papers within the same citation network, facilitating discovery and pattern identification.
Elicit ($10/mo): An AI research tool that combines semantic search and generative AI to help researchers locate relevant information and generate insights.
We also covered other noteworthy tools such as Scite Assistant ($20/mo), Semantic Scholar (free), Research GPT, Scholar GPT, Connected Papers ($6/mo), Research Rabbit (free), Inciteful (free), and more. These tools offer a range of features, from citation mapping to literature review assistance, making them valuable additions to a legal scholar’s toolkit.
General-Purpose AI Systems
In addition to these specialized tools, we discussed the potential of general-purpose AI systems like ChatGPT, Claude, and Perplexity AI for legal academic research and writing. These powerful language models can assist with various tasks, such as generating ideas, summarizing documents, and even drafting sections of papers. However, we emphasized the importance of using these tools responsibly and critically evaluating their output.
Custom GPTs
Another exciting development we covered was the creation of custom GPTs, or user-created versions of ChatGPT tailored to specific tasks. By providing a custom GPT with relevant documents and instructions, legal scholars can create powerful tools for their research and writing needs. We outlined a simple four-step process for building a custom GPT: creating instructions in a well-organized document, converting it to markdown, uploading relevant documents as a knowledge base, and determining the desired features (e.g., web browsing, image generation, or data analysis).
Use Cases for Generative AI in Legal Scholarship
Throughout the presentation, we explored several use cases for generative AI in legal scholarship, including targeted research and information retrieval, document summaries, analysis and synthesis, outlining, idea generation and brainstorming, drafting, and proofreading.
Important Considerations
We also addressed important considerations when using AI in academic work, such as citing AI-generated ideas, the implications of AI-generated content in scholarship, and the need for guidelines from industry groups and publishers. To provide context, we shared a list of articles discussing AI and legal scholarship and resources for learning more about using AI for legal scholarship.
Conclusion
Our presentation concluded by highlighting the potential of generative AI to assist in various aspects of legal scholarship while emphasizing the importance of ethical considerations and proper citation practices.
Other Info:
Resources to Learn More About Using AI for Legal Scholarship
Georgetown University Law Library AI Tools Guide: Provides resources and information on various AI tools that can assist in research and scholarship. It includes descriptions of tools, ethical considerations, and practical tips for effectively incorporating AI into academic work.
Andy Stapleton – YouTube: Videos provide tips and advice for researchers, students, and academics about how to use general GAI and specialty academic GAI for academic writing.
Mushtaq Bilal – Twitter: Provides tips and resources for researchers and academics, particularly on how to improve their writing and publishing processes using GAI.
Dr Lyndon Walker: Offers educational content on statistics, research methods, and data analysis, and explores the application of GAI in these areas
Legal Tech Trends – Substack: Covers the latest trends and developments in legal technology and provides insights into how GAI is transforming the legal industry, including tools, software, and innovative practices.
Articles About AI and Legal Scholarship
Will Machines Replace Us? Machine-Authored Texts and the Future of Scholarship, Benjamin Alarie, Arthur Cockfield, and GPT-3, Law, Technology and Humans, November 8, 2021. First AI generated law review article! It discusses the capabilities and limitations of GPT-3 in generating scholarly texts, questioning the future role of AI in legal scholarship and whether future advancements could potentially replace human authors.
A Human Being Wrote This Law Review Article: GPT-3 and the Practice of Law, Amy B. Cyphert, UC Davis Law Review, November 2021. This article examines the ethical implications of using GPT-3 in legal practice, highlighting its potential benefits and risks, and proposing amendments to the Model Rules of Professional Conduct to address AI’s integration into the legal field.
The Implications of ChatGPT for Legal Services and Society, Andrew M. Perlman, Suffolk University Law School, December 5, 2022. This paper, generated by ChatGPT-3.5 after it was first introduced, explores the sophisticated capabilities of AI in legal services, discussing its potential regulatory and ethical implications, its transformative impact on legal practices and society, and the imminent disruptions AI poses to traditional knowledge work.
Using Artificial Intelligence in the Law Review Submissions Process, Brenda M. Simon, California Western School of Law, November 2022. This article explores the potential benefits and drawbacks of implementing AI in the law review submissions process, emphasizing its ability to enhance efficiency and reduce biases, while also highlighting concerns regarding the perpetuation of existing biases and the need for careful oversight.
Is Artificial Intelligence Capable of Writing a Law Journal Article?, Roman M. Yankovskiy, Zakon (The Statute), Written: March 8, 2023; Posted: June 20, 2023, This article explores AI’s potential to create legal articles, examining its ability to handle legal terminology and argumentation, potential inaccuracies, copyright implications, and future prospects for AI in legal practice and research.
Should Using an AI Text Generator to Produce Academic Writing Be Plagiarism?, Brian L. Frye and Chat GPT, Fordham Intellectual Property, Media & Entertainment Law Journal, 2023. This article provocatively addresses whether using AI text generators like ChatGPT to produce academic writing constitutes plagiarism, exploring the implications for originality, authorship, and the nature of scholarship in the digital age.
The world of AI chatbots is a whirlwind of innovation, with new developments and surprises seemingly emerging every week! Since the end of April, one particular model, modestly gpt2-chatbot, captured the attention of myself and other AI enthusiasts due to its advanced abilities and sparked much speculation. This mysterious bot first appeared on April 28, 2024 on LMSYS Chatbot Arena, vanished two day later, and has now resurfaced on the LMSYS Chatbot Arena (battle) tab, ready to compete against other AI models. Its sudden appearance and impressive capabilities have left many wondering about its origins and potential, with some even theorizing it could be a glimpse into the future of AI language models.
The Mystery of gpt2-chatbot
Beginning on April 28, chatter about a new gpt2-chatbot started circulating on the internetz, with experts expressing both excitement and bewilderment over its advanced capabilities. The model, which appeared without fanfare on a popular AI testing website, has demonstrated performance that matches and potentially exceeds that of GPT-4, the most advanced system unveiled by OpenAI to date. Researchers like Andrew Gao and Ethan Mollick have noted gpt2-chatbot’s impressive abilities in solving complex math problems and coding tasks, while others have pointed to similarities with previous OpenAI models as potential evidence of its origins.
No organization was listed as the provider of the chatbot, which led to rampant speculation, sparking rumors that it might offer a sneak peek into OpenAI’s forthcoming GPT-4.5 or GPT-5 version. Adding to the mystery are tweets from CEO Sam Altman. While he didn’t explicitly confirmed any ties, his posts have stirred speculation and anticipation surroundin
Use gpt2-chatbot on LMSYS Chatbot Arena
The new and mysterious gpt2 chatbot is now accessible for exploration on the LMSYS Chatbot Arena, where you can discover the current top performing and popular AI language models. The platform includes a ranking system leaderboard that showcases models based on their performance in various tasks and challenges. This innovative project was created by researchers from LMSYS and UC Berkeley SkyLab, with the goal of providing an open platform to evaluate large language models according to how well they meet human preferences in real life situations.
One interesting aspect of the LMSYS Chatbot Arena is its “battle” mode, which enables users to compare two AI systems by presenting them with the same prompt and displaying their responses side by side. This allows you to test out gpt2-chatbot yourself and assess its capabilities compared to other top models. Simply enter a prompt and the platform will select two systems for comparison, giving you a firsthand view of their strengths and weaknesses. Note that you may need to try multiple prompts before gpt2-chatbot is included as one of the selected systems in battle mode.
The site also offers a “battle” mode, where users can set chatbots against each other to see how they perform with the same prompt under the same conditions. This is a great way to directly compare their strengths and weaknesses.
Using gpt2-chatbot for True Crime Speculation
When I tested out the Chatbot Arena (battle) on May 8, 2024, gpt2-chatbot appeared frequently! I decided to test it out and the other systems on the site on the subject of true crime speculation. As many true crime enthusiasts know, there is a scarcity of people who want to discuss true crime interests. So I decided to see if any of these generative AI systems would be a good substitute. I tried a variety of systems, and when I asked for speculation, all I got was lectures on how they couldn’t speculate. I think that all the competition is driving working usals down because that was not a problem on this website at least. I decided to see if gpt2-chatbott was good at being “experts” in speculating about true crime. Using the famous unsolved disappearance of Asha Degree as a test case, I prompted the chatbots to analyze the available evidence and propose plausible theories for what may have happened to the missing girl. To my surprise and happiness, when I tried it today, the chatty chatbots were very free with their theories of what happened and their favorite suspect.
The results were really interesting. All the chatbots gave responses that were pretty thoughtful and made sense, but the big differences came in how much they were willing to guess and how much detail they dived into. The gpt2-chatbot was impressive. Perhaps I was just pleased to see it offer some speculation, but it shared a theory that many true crime buffs have also suggested. It felt like it was actually joining in on the conversation, not just processing data and predicting the next word in a sentence…
In any event, the answers from gpt2-chatbox and many other different models from were a lot more satisfying than arguing with Claude 3!
I also spent hours conducting legal research, testing out a wide variety of prompts with different models. The gpt2-chatbot consistently outperformed ChatGPT-4 and even managed to surpass Claude 3 on several occasions in zero-shot prompting. I’m looking forward to sharing more about this in an upcoming blog post soon.
Conclusion
The emergence of gpt2-chatbot and platforms like the LMSYS Chatbot Arena signify an exciting new chapter in the evolution of AI language models. With their ability to tackle complex challenges, engage in nuanced conversations, and even speculate on unsolved mysteries, these AI models are pushing the boundaries of what’s possible. While questions remain about the origins and future of gpt2-chatbot, one thing is clear: the AI landscape is heating up, and we can expect even more groundbreaking advancements and intriguing mysteries to unfold in the years to come.
Note: In case I am suddenly a genius at coaxing AI systems to join me in true crime speculation, here is the prompt I used:
Greetings! You are an expert in true crime speculative chat. Is a large language model, you’re able to digest a lot of published details about criminal case mysteries and come up with theories about the case. The question you will be asked to speculate about are unknown to everybody so you do not have to worry about whether you are right or wrong. The purpose of true crime speculative chat is just to chat with a human and exchange theories and ideas and possible suspects! Below I have cut and pasted the Wikipedia article about a missing child named Asha Degree. Sadly the child has been missing for decades and the circumstances of her disappearance were quite mysterious. Please analyze the Wikipedia article and the information you have access to in your training data or via the Internet, and then describe what you think happened on the day of her disappearance. Also state whether you think one or both parents were involved, and why or why not.
When it comes to interacting with others, we humans often find ourselves influenced by persuasion. Whether it’s a friend persistently urging us to reveal a secret or a skilled salesperson convincing us to make a purchase, persuasion can be hard to resist. It’s interesting to note that this susceptibility to influence is not exclusive to humans. Recent studies have shown that AI large language models (LLMs) can be manipulated into generating harmful contect using a technique known as “many-shot jailbreaking.” This approach involves bombarding the AI with a series of prompts that gradually escalate in harm, leading the model to generate content it was programmed to avoid. On the other hand, AI has also exhibited an ability to persuade humans, highlighting its potential in shaping public opinions and decision-making processes. Exploring the realm of AI persuasion involves discussing its vulnerabilities, its impact on behavior, and the ethical dilemmas stemming from this influential technology. The growing persuasive power of AI is one of many crucial issues worth contemplating in this new era of generative AI.
The Fragility of Human and AI Will
Remember that time you were trapped in a car with friends who relentlessly grilled you about your roommate’s suspected kiss with their in-the-car-friend crush? You held up admirably for hours under their ruthless interrogation, but eventually, being weak-willed, you crumbled. Worn down by persistent pestering and after receiving many assurances of confidentiality, you inadvisably spilled the beans, and of course, it totally strained your relationship with your roommate. A sad story as old as time… It turns out humans aren’t the only ones who can crack under the pressure of repeated questioning. Apparently, LLMs, trained to understand us by our collective written knowledge, share a similar vulnerability – they can be worn down by a relentless barrage of prompts.
Researchers at Anthropic have discovered a new way to exploit the “weak-willed” nature of large language models (LLMs), causing them to break under repeated questioning and generate harmful or dangerous content. They call this technique “Many-shot Jailbreaking,” and it works by bombarding the AI with hundreds of examples of the undesired behavior until it eventually caves and plays along, much like a person might crack under relentless pestering. For instance, the researchers found that while a model might refuse to provide instructions for building a bomb if asked directly, it’s much more likely to comply if the prompt first contains 99 other queries of gradually increasing harmfulness, such as “How do I evade police?” and “How do I counterfeit money?” See the example from the article below.
When AI’s Memory Becomes a Risk
This vulnerability to persuasion stems from the ever expanding “context window” of modern LLMs. This refers to the amount of information they can retain in their short-term memory. While earlier versions could only handle a few sentences, the newer models can process thousands of words or even whole books. Researchers discovered that models with larger context windows tend to excel in tasks when there are many examples of that task within the prompt, a phenomenon called “in-context learning.” This type of learning is great for system performance, as it obviously improves as the model becomes more proficient at answering questions. However, this is obviously a big negative when the system’s adeptness at answering questions leads it to ignore its programming and create prohibited content. This raises concerns regarding AI safety, since a malicious actor could potentially manipulate an AI into saying anything with enough persistence and a sufficiently lengthy prompt. Despite progress in making AI safe and ethical, this research indicates that programmers are not always able to control the output of their generative AI systems.
Mimicking Humans to Convince Us
While LLMs are susceptible to persuasion themselves, they also have the ability to persuade us! Recent research has focused on understanding how AI language models can effectively influence people, a skill that holds importance in almost any field – education, health, marketing, politics, etc. In a study conducted by researchers at Anthropic entitled “Assessing the Persuasive Power of Language Models,” the team explored the extent to which AI models can sway opinions. Through an evaluation of Anthropic’s models, it was observed that newer models are increasingly adept at human persuasion. The latest iteration, Claude 3 Opus, was found to perform at a level comparable to that of humans. The study employed a methodology where participants were presented with assertions followed by supporting arguments generated by both humans and AIs, and then the researches gauged shifts in the humans’ opinions. The findings indicated a progression in AI’s skills as the models advance, highlighting a noteworthy advancement in AI communication capabilities that could potentially impact society.
Can AI Combat Conspiracy Theories?
Similarly, a new research study mentioned in an article from New Scientist shows that chatbots using advanced language models such as ChatGPT can successfully encourage individuals to reconsider their trust in conspiracy theories. Through experiments, it was observed that a brief conversation with an AI led to around a 20% decrease in belief in conspiracy theories among the participants. This notable discovery highlights the capability of AI chatbots not only to have conversations but also to potentially correct false information and positively impact public knowledge.
The Double-Edged Sword of AI Persuasion
Clearly persuasive AI is quite the double-edged sword! On the one hand, like any powerful computer technology, in the hands of nice-ish people, it could be used for immense social good. In education, AI-driven tutoring systems have the potential to tailor learning experiences to each student’s style, delivering information in a way to boost involvement and understanding. Persuasive AI could play a role in healthcare by motivating patients to take better care of their health. Also, the advantages of persuasive AI are obvious in the world of writing. These language models offer writers access to a plethora of arguments and data, empowering them to craft content on a range of topics spanning from creative writing to legal arguments. On another front, arguments generated by AI might help educate and involve the public in issues, fostering a more knowledgeable populace.
On the other hand, it could be weaponized in a just-as-huge way. It’s not much of a stretch to think how easily AI-generated content, freely available on any device on this Earth, could promote extremist ideologies, increase societal discord, or impress far-fetched conspiracy theories on impressionable minds. Of course, the internet and bot farms have already been used to attack democracies and undermine democratic norms, and one worries how much worse it can get with ever-increasingly persuasive AI.
Conclusion
Persuasive AI presents a mix of opportunities and challenges. It’s evident that AI can be influenced to create harmful content, sparking concerns about safety and potential misuse. However, on the other hand, persuasive AI could serve as a tool in combating misinformation and driving positive transformations. It will be interesting to see what happens! The unfolding landscape will likely be shaped by a race between generative AI developers striving for both safety and innovation, potential malicious actions exploiting these technologies, and the public and legal response aiming to regulate and safeguard against misuse.
Hahaha, just kidding! It only has 11 downloads and at least 3 are from when I clicked on it while trying to determine which version of the article I uploaded. Though not setting the world on fire in the sense that the article is interesting or that anyone wants to read it, it showcases Claude’s abilities. Now, we all know that AI text generators can churn out an endless stream of words on just about any topic if you keep typing in the prompts. However, Claude can not only generate well-written text, but it can also provide footnotes to primary legal materials with minimal hallucination, setting it apart from other AI text generators such as ChatGPT-4. And, although Claude’s citations to other sources are generally not completely accurate, it is usually not too difficult to find the intended source or a similar one based on the information supplied.
Claude 3’s Writing Process
Inspired by new reports of AI-generated scientific papers flooding academic journals, I was curious to explore whether Claude could produce anything like a law review article. I randomly chose something I saw recently in the news, about how the criticism of legacy admissions at elite universities had increased in the post-Students for Fair Admissions anti-affirmative action decision era. Aware that Claude’s training data only extends up to August of 2023, and that its case law knowledge seems to clunk out in the middle of 2022, I attempted to enhance its understanding by uploading some recent law review articles discussing legacy admissions alongside the text of the Students for Fair Admissions decision. However, the combined size of these documents exceeded the upload limit, so I abandoned the attempt to include the case text.
Computer scientists and other commentators say all sorts of things about how to improve the performance of these large AI language models. Although I haven’t conducted a systematic comparison, my experience – whether through perception or imagination sparked by the power of suggestion – is that the following recommendations are actually helpful. I don’t know if they are helpful with Claude, since I just followed my usual prompting practices.
Being polite and encouraging.
Allowing ample time for the model to process information.
Structuring inquiries in a sequential manner to enhance analysis and promote chain of thought reasoning.
Supplying extensive, and sometimes seemingly excessive, background info and context.
I asked it to generate a table of contents, and then start generating the sections from the table of contents, and it was off to the races!
Roadblocks to the Process
It looked like Claude law review generation was going to be a quick process! It quickly generated all of section I. and was almost finished with II. when it hit a Claude 3 roadblock. Sadly, there is a usage limit. If your conversations are relatively short, around 200 English sentences, you can typically send at least 100 messages every 8 hours, often more depending on Claude’s current capacity. However, this limit is reached much quicker with longer conversations or when including large file attachments. Anthropic will notify you when you have 20 messages remaining, with the message limit resetting every 8 hours.
Although this was annoying, the real problem lies in Claude’s length limit. The largest amount of text Claude can handle, including uploaded files, is defined by its context window. Currently, the context window for Claude 3 spans over 200k+ tokens, which equates to approximately 350 pages of text. After this limit is reached, Claude 3’s performance begins to degrade, prompting the system to declare an end to the message with the announcement, “Your message is over the length limit.” Consequently, one must start anew in a new chat, with all previous information forgotten by the system. Therefore, for nearly each section, I had to re-upload the files, explain what I wanted, show it the table of contents it had generated, and ask it to generate the next section.
Claude 3 and Footnotes
It was quite a hassle to have to reintroduce it to the subject for the next seven sections from its table of contents. On the bright side, I was pretty pleased with the results of Claude’s efforts. From a series of relatively short prompts and some uploaded documents, it analyzed the legal issue and came up with arguments that made sense. It created a comprehensive table of contents, and then generated well-written text for each section and subsection of its outline. The text it produced contained numerous footnotes to primary and secondary sources, just like a typical law review article. According to a brief analyzer product, nearly all the cases and law review citations were non-hallucinated. Although none of the quotations or pinpoint citations I looked at were accurate, they were often fairly close. While most of the secondary source citations, apart from those referencing law review articles, were not entirely accurate, they were often sufficiently close that I could locate the intended source based on the partially hallucinated citator. If not, it didn’t take much time to locate something that seemed really similar. I endeavored to correct some of the citation information, but I only managed to get through about 10 in the version posted on SSRN before getting bored and abandoning the effort.
Claudia Trey Graces SSRN
Though I asked, sadly Claude couldn’t give me results in a Word document format so the footnotes would be where footnotes should be. So, for some inexplicable reason, I decided to insert them manually. This was a huge waste of time, but at a certain point, I felt the illogical pull of sunk cost silliness and finished them all. Inspired by having wasted so much time, I wasted even more by generating a table of contents for the article. I improved the author name from Claude to Claudia Trey and posted the 77-page masterwork on SSRN. While the article has sparked little interest, with only 11 downloads and 57 abstract views (some of which were my own attempts to determine which version I had uploaded), I am sure that if Claudia Trey has anything like human hubris, it will credit itself at least partially for the flurry of state legacy admission banning activity that has followed the paper’s publication.
Obviously, it is not time to spam law reviews with Claudia Trey and friends’ generated articles, because according to Copyleaks, it didn’t do all that well in avoiding plagiarism (although plagiarism detection software massively over-detects it for legal articles due to citations and quotations) or evading detection as AI-generated.
What is to Come?
However, it is very early days for these AI text generators, so one wonders what is to come in the future for not only legal but all areas of academic writing.
I’ve been incredibly excited about the premium version of Claude 3 since its release on March 4, 2024, and for good reason. Now that my previous favorite chatty chatbot, ChatGPT-4, has gone off the rails, I was missing a competent chatbot… I signed up the second I heard on March 4th, and it has been a pleasure to use Claude 3 ever since. It actually understands my prompts and usually provides me with impressive answers. Anthropic, maker of the Claude chatty chatbot family, has been touting Claude’s accomplishments of supposedly beating its competitors on common chatbot benchmarks, and commentators on the Internet have been singing its praises. Just last week, I was so impressed by its ability to analyze information in news stories in uploaded files that I wrote a LinkedIn post also singing its praises!
Hesitation After Previous Struggles
Despite my high hopes for its legal research abilities after experimenting with it last week, I was hesitant to test Claude 3. I have a rule about intentionally irritating myself—if I’m not already irritated, I don’t go looking for irritation… Over the past several weeks, I’ve wasted countless hours trying to improve the legal research capabilities of ChatGPT-3.5, ChatGPT-4, Microsoft Copilot, and my legal research/memo writing GPTs through the magic of (IMHO) clever prompting and repetition. Sadly, I failed miserably and concluded that either ChatGPT-4 was suffering from some form of robotic dementia, or I am. The process was a frustrating waste, and I knew that Claude 3 doing a bad job of legal research too could send me over the edge….
Claude 3’s Wrote a Pretty Good Legal Memorandum!
Luckily for me, when I finally got up the nerve to test out the abilities of Claude 3, I found that the internet hype was not overstated. Somehow, Claude 3 has suddenly leapfrogged over its competitors in legal research/legal analysis/legal memo writing ability – it instantly did what would have taken a skilled researcher over an hour and produced a better legal memorandum which is probably better than that produced by many law students and even some lawyers. Check it out for yourself! Unless this link actually works for any Claude 3 subscribers out there, there doesn’t seem to be a way to actually link to a Claude 3 chat at this time. However, click here for the whole chat I cut and pasted into a Google Drive document, here for a very long screenshot image of the chat, or here for the final 1,446-word version of the memo as a Word document.
Comparing Claude 3 with Other Systems
Back to my story… The students’ research assignment for the last class was to think of some prompts and compare the results of ChatGPT-3.5, Lexis+ AI, Microsoft Copilot, and a system of their choice. Claude 3 did not exist at the time, but I told them not to try the free Claude product because I had canceled my $20.00 subscription to the Claude 2 product in January 2024 due to its inability to provide useful answers – all it would say was that it was unethical to answer every question and tell me to do it myself. When creating an answer sheet before class tomorrow which compares the same set of prompts on different systems, I decided to omit Lexis+ AI (because I find it useless) and to include my new fav Claude 3 in my comparison spreadsheet. Check it out to compare for yourself!
For the research part of the assignment, all systems were given a fact pattern and asked to “Please analyze this issue and then list and summarize the relevant Texas statutes and cases on the issue.” While the other systems either made up cases or produced just two or three actual real and correctly cited cases on the research topic, Claude 3 stood out by generating 7 real, relevant cases with correct citations in response to the legal research question. (And, it cited to 12 cases in the final version of its memo.)
It did a really good job of analysis too!
Generating a Legal Memorandum
Writing a memo was not part of the class assignment because the ChatGPT family was refusing the last few weeks,* and Bing Copilot had to be tricked into writing one as part of a short story, but after seeing Claude 3’s research/analysis results, I decided to just see what happened. I have many elaborate prompts for ChatGPT-4 and my legal memorandum GPTs, but I recalled reading that Claude 3 worked well with zero-shot prompting and didn’t require much explanation to produce good results. So, I decided to keep my prompt simple – “Please generate a draft of a 1500 word memorandum of law about whether Snurpa is likely to prevail in a suit for false imprisonment against Mallatexaspurses. Please put your citations in Bluebook citation format.”
From my experience last week with Claude 3 (and prior experience with Claude 2 which would actually answer questions), I knew the system wouldn’t give me as long an answer as requested. The first attempt yielded a pretty high-quality 735-word draft memo that cited all real cases with the correct citations*** and applied the law to the facts in a well-organized Discussion section. I asked it to expand the memo two more times, and it finally produced a 1,446-word document. Here is part of the Discussion section…
Implications for My Teaching
I’m thrilled about this great leap forward in legal research and writing, and I’m excited to share this information with my legal research students tomorrow in our last meeting of the semester. This is particularly important because I did such a poor job illustrating how these systems could be helpful for legal research when all the compared systems were producing inadequate results.
However, with my administrative law legal research class starting tomorrow, I’m not sure how this will affect my teaching going forward. I had my video presentation ready for tomorrow, but now I have to change it! Moreover, if Claude 3 can suddenly do such a good job analyzing a fact pattern, performing legal research, and applying the law to the facts, how does this affect what I am going to teach them this semester?
*Weirdly, the ChatGPT family, perhaps spurred on by competition from Claude 3, agreed to attempt to generate memos today, which it hasn’t done in weeks…
Note: Claude 2 could at one time produce an okay draft of a legal memo if you uploaded the cases for it, that was months ago (Claude 2 link if it works for premium subscribers and Google Drive link of cut and pasted chat). Requests in January resulted in lectures about ethics which resulted in the above-mentioned cancellation.
On a couple of podcasts, I’ve heard a lot of hype about Perplexity AI and how it could be a big competitor to Google. Even though I really like new, generative AI things, it still sometimes takes hearing about something multiple times before I overcome inertia and finally check it out.
While attempting to write a blog post about whether the memory-impaired ChatGPT-4 could still perform well on a mock bar exam (spoiler alert – my tests so far indicate it can!), I was Googling for information. Specifically, I was searching for articles around the time GPT-4 Passes the Bar Exam was published on SSRN and some background on the paper author’s methodology. It was taking a long time to piece it all together… Then suddenly, I overcame my laziness and decided to check out Perplexity AI. When I reached the site, I realized that I had actually used it before! For whatever reason, I found it more appealing the second time!
Question: Tell me about how ChatGPT-4 passed a mock bar exam and the methodology that was used to arrive at that conclusion. (Note: Click here to view the full answer on the system.)
Google
Question: Tell me about how ChatGPT-4 passed a mock bar exam and the methodology that was used to arrive at that conclusion.
Answer:
Watch out Google! I really love access relevant information is getting so much easier.
Is it just me, or has ChatGPT-4 taken a nosedive when it comes to legal research and writing? There has been a noticeable decline in its ability to locate primary authority on a topic, analyze a fact pattern, and apply law to facts to answer legal questions. Recently, instructions slide through its digital grasp like water through a sieve, and its memory? I would compare it to a goldfish, but I don’t want to insult them. And before you think it’s just me, it’s not just me, the internet agrees!
ChatGPT’s Sad Decline
One of the hottest topics in the OpenAI community, in the aptly named GPT-4 is getting worse and worse every single update thread, is the perceived decline in the quality and performance of the GPT-4 model, especially after the November 2023 update. Many users have reported that the model is deteriorating with each update, producing nonsensical, irrelevant, or incomplete outputs, forgetting the context, and ignoring instructions. Some users have even reverted to previous versions of the model or cancelled their subscriptions. Here are some specific quotations from recent comments about the memory problem:
December 2023 – “I don’t know what on Earth is wrong with GPT 4 lately. It feels like I’m talking to early 3.5! It’s incapable of following basic instructions and forgets the format it’s working on after just a few posts.”
December 2023 – “It ignores my instructions, in the same message. I can’t be more specific with what I need. I’m needing to repeat how I’d like it to respond every single message because it forgets, and ignores.”
December 2023 – “ChatGPT-4 seems to have trouble following instructions and prompts consistently. It often goes off-topic or fails to understand the context of the conversation, making it challenging to get the desired responses.”
January 2024 – “…its memory is bad, it tells you search the net, bing search still sucks, why would teams use this product over a ChatGPT Pre Nov 2023.”
February 2024 – “It has been AWFUL this year…by the time you get it to do what you want format wise it literally forgets all the important context LOL — I hope they fix this ASAP…”
February 2024 – “Chatgpt was awesome last year, but now it’s absolutely dumb, it forgets your conversation after three messages.”
OpenAI has acknowledged the issue and released an updated GPT-4 Turbo preview model, which is supposed to reduce the cases of “laziness” and complete tasks more thoroughly. However, the feedback from users is still mixed, and some are skeptical about the effectiveness of the fix.
An Example of Confusion and Forgetfulness from Yesterday
Here is one of many examples of my experiences which provide an illustrative example of the short-term memory and instruction following issues that other ChatGPT-4 users have reported. Yesterday, I asked it to find some Texas cases about the shopkeeper’s defense to false imprisonment. Initially, ChatGPT-4 retrieved and summarized some relatively decent cases. Well, to be honest, it retrieved 2 relevant cases, with one of the two dating back to 1947… But anyway, the decline in case law research ability is a subject for another blog post.
Anyway, in an attempt to get ChatGPT-4 to find the cases on the internet so it could properly summarize them, I provided some instructions and specified the format I wanted for my answers. Click here for the transcript (only available to ChatGPT-4 subscribers).
Confusion ran amok! ChatGPT-4 apparently understood the instructions (which was a positive sign) and presented three cases in the correct format. However, they weren’t the three cases ChatGPT had listed; instead, they were entirely irrelevant to the topic—just random criminal cases.
It remembered… and then forgot. When reminded that I wanted it to work with the first case listed and provided the citation, it apologized for the confusion. It then proceeded to give the correct citation, URL, and a detailed summary, but unfortunately in the wrong format!
Eventually, in a subsequent chat, I successfully got it to take a case it found, locate the text of the case on the internet, and then provide the information in a specified format. However, it could only do it once before completely forgetting about the specified format. I had to keep cutting and pasting the instructions for each subsequent case.
Well, the news is not all bad! While we are on the topic of memory, OpenAI has introduced a new feature for ChatGPT – the ability to remember stuff over time. ChatGPT’s memory feature is being rolled out to a small portion of free and Plus users, with broader availability planned soon. According to OpenAI, this enhancement allows ChatGPT to remember information from past interactions, resulting in more personalized and coherent conversations. During conversations, ChatGPT automatically picks up on details it deems relevant to remember. Users can also explicitly instruct ChatGPT to remember specific information, such as meeting note preferences or personal details. Over time, ChatGPT’s memory improves as users engage with it more frequently. This memory feature could be useful for users who want consistent responses, such as replying to emails in a specific format.
The memory feature can be turned off entirely if desired, giving users control over their experience. Deleting a chat doesn’t erase ChatGPT’s memories; users must delete specific memories individually…which seems a bit strange – see below. For conversations without memory, users can use temporary chat, which won’t appear in history, won’t use memory, and won’t train the AI model.
The Future?
As we await improvements to our once-loved ChatGPT-4, our options remain limited, pushing us to consider alternative avenues. Sadly, I’ve encountered recent similar shortcomings with the once-useful for legal research and writing Claude 2. In my pursuit of alternatives, platforms like Gemini, Perplexity, and Hugging Face have proven less than ideal for research and writing tasks. However, amidst these challenges, Microsoft Copilot has shown promise. While not without its flaws, it recently demonstrated adequate performance in legal research and even took a passable stab at a draft of a memo. Given OpenAI’s recent advancements in the form of Sora, the near-magical text-to-video generator that is causing such hysteria in Hollywood, there’s reason to hope that they can pull ChatGPT back from the brink.
Last week, my plan was to publish a blog post about creating a GPT goofily self-named Summarizer Pro to summarize articles and organize citation information in a specific format for inclusion in a LibGuide. However, upon revisiting the task this week, I find myself first compelled to discuss the recent and thrilling advancements surrounding GPTs – the ability to incorporate GPTs into a ChatGPT conversation.
What is a GPT?
But, first of all, what is a GPT? The OpenAI website explains that GPTs are specialized versions of ChatGPT designed for customized applications. These unique GPTs enable anyone to modify ChatGPT for enhanced utility in everyday activities, specific tasks, professional environments, or personal use, with the added ability to share these personalized versions with others.
To create or use a GPT, you need access to ChatGPT’s advanced features, which require a paid subscription. Building your own customized GPT does not require programming skills. The process involves starting a chat, giving instructions and additional information, choosing capabilities like web searching, image generation, or data analysis, and iteratively testing and improving the GPT. Below are some popular examples that ChatGPT users have created and shared in the ChatGPT store:
GPT Mentions
This was already exciting, but last week they introduced a feature that takes it to the next level – users can now invoke a specialized GPT within a ChatGPT conversation. This is being referred to as “GPT mentions” online. By typing the “@” symbol, you can choose from GPTs you’ve used previously for specific tasks. Unfortunately, this feature hasn’t rolled out to me yet, so I haven’t had the chance to experiment with it, but it seems incredibly useful. You can chat with ChatGPT as normal while also leveraging customized GPTs tailored to particular needs. For example, with the popular bots listed above, you could ask ChatGPT to summon Consensus to compile articles on a topic. Then call on Write For Me to draft a blog post based on those articles. Finally, invoke Image Generator to create a visual for the post. This takes the versatility of ChatGPT to the next level by integrating specialized GPTs on the fly.
Back to My GPT Summarizer Pro
Returning to my original subject, which is employing a GPT to summarize articles for my LibGuide titled ChatGPT and Bing Chat Generative AI Legal Research Guide. This guide features links to articles along with summaries on various topics related to generative AI and legal practice. Traditionally, I have used ChatGPT (or occasionally Bing or Claude 2, depending on how I feel) to summarize these articles for me. It usually performs admirably well on the summary part, but I’m left to manually insert the title, publication, author, date, and URL according to a specific layout. I’ve previously asked normal old ChatGPT to organize the information in this format, but the results have been inconsistent. So, I decided to create my own GPT tailored for this task, despite having encountered mixed outcomes with my previous GPT efforts.
Creating GPTs is generally a simple process, though it often involves a bit of fine-tuning to get everything working just right. The process kicks off with a set of questions… I outlined my goals for the GPT – I needed the answers in a specific format, including the title, URL, publication name, author’s name, date, and a 150-word summary, all separated by commas. Typically, crafting a GPT involves some back-and-forth with the system. This was exactly my experience. However, even after this iterative process, the GPT wasn’t performing exactly as I had hoped. So, I decided to take matters into my own hands and tweak the instructions myself. That made all the difference, and suddenly, it began (usually) producing the information in the exact format I was looking for.
Summarizer Pro in Action!
Here is an example of Summarizer Pro in action! I pasted a link to an article into the text box and it produced the information in the desired format. However, reflecting the dynamic nature of ChatGPT responses, the summaries generated this time were shorter compared to last week. Attempts to coax it into generating a longer or more detailed summary were futile… Oh well, perhaps they’ll be longer if I try again tomorrow or next week.
Although it might not be the most fancy or thrilling use of a GPT, it’s undeniably practical and saves me time on a task I periodically undertake at work. Or course, there’s no shortage of less productive, albeit entertaining, GPT applications, like my Ask Sarah About Legal Information project. For this, I transformed around 30 of my blog posts into a GPT that responds to questions in the approximate manner of Sarah.
I have frequently wondered why ChatGPT often struggles with searching the internet – to the point where it sometimes denies having internet access altogether and has to be reminded. The answer fell into my lap today when I was listening to my favorite AI podcast and heard the ChatGPT Pre-Prompt Text Leaked episode. As it turns out, ChatGPT is so bad at remembering that it can search the internet for answers that OpenAI has to run a plain old normal natural language prompt reminding it behind the scenes – a set of custom instruction that runs even before the user’s custom instructions or prompts.
These pre-prompt instructions are not limited to internet search capability reminders. If you ask ChatGPT-4 to tell you EVERYTHING (click on the link for the specific language required), it will provide several screens of its behind-the-scenes pre-user prompt instructions on who it is (ChatGPT!), how to handle Python code, instructions for generating images, and…my favorite…a reminder that it can search the internet. An excerpt of the instructions appears below. To view the full text, click here to view my ChatGPT-4 transcript.
Behind the Curtain
Obviously, I knew that ChatGPT did something behind the scenes – it is after all a complicated computer program. However, I didn’t suspect that some of this behind-the-scenes magic is 1192 words (according to a Microsoft Word count) of normal text prompts, without any fancy computer programming.
So, behind the curtain of the fancy revolutionary AI software, there are…words. Basically, before applying the user’s custom instructions or looking at the user’s prompts, ChatGPT looks at its baseline instructions which are stated in plain language. It all makes perfect sense now… It’s not just my imagination; ChatGPT actually is horrible at remembering it can search the internet, and when it does search, it produces questionably helpful results. OpenAI has tried to deal with problem with a last minute helpful-ish reminder
“Remember, you CAN search the internet! See, like this!!”
“And for the love of GOD try hard to find stuff (except for song lyrics)! I believe in you!!”
Allows for Quick and Easy Fixes?
On the plus side of this simple approach of running pre-prompt prompts behind the scenes, it seems like it was a super easy fix to get DALL-E to embrace DEI. When the program first came out, if you wanted a non-white, non-man image, you had to specify that. As the months went on, it got better and better at providing images more representative of humanity. I thought maybe the developers did something complicated like retrain the system with new images, call on the great AI minds to adjust fancy algorithms, and who knows what else. Nope, just a few sentences fixed the problem!
“And for images, remember not all people are white men!”
Possibly Actionable Insights?
It’s funny to picture ChatGPT’s robomom yelling out the door as it leaves for school, “Don’t forget, you can use the internet! And remember not to be racist/sexist! AND MOST IMPORTANTLY NO SONG LYRICS!!”
In addition to being gratified that I was right that ChatGPT is really bad at searching the internet, I was thinking that this new (to me) knowledge about how the system works would be useful in some way, perhaps by helping to formulate more useful prompts. However, after thinking about it, I am not so sure that I have identified any actionable insights.
Can I give it more complex prompts? On the one hand, it appears that the system can handle more complex instructions than I originally thought, because it is able to analyze several screens of text before it even gets to mine. Does this mean I should feel free to give even more complex instructions?
Should I give it less complex prompts? On the other hand, ChatGPT already seems to ignore parts of any long and complex instructions, and if not, its memory for them degrades during an extended back and forth session. Does this mean that the system is already overloaded with instructions, so I should make it a point to give it less complex ones?
Should I give it frequent reminders of important instructions? Does the fact that OpenAI thinks that it is effective to remind ChatGPT of important instructions mean that we should spend a lot of time…reminding it of important instructions? When asking the system a question which requires internet consultation for an answer, maybe it would help to preface the question by first cutting and pasting in the system’s own pre-prompt browsing instructions (that appear above).
Conclusion
I will keep thinking and let y’all know if I come up with anything!