Benchmarking a Moving Target, or let’s run a hypo through 7 AIs and see what happens

Appendix – the results

2 thoughts on “Benchmarking a Moving Target, or let’s run a hypo through 7 AIs and see what happens

  1. Excellent post! So many questions and thoughts are going through my head! Have you tried posting the same question to the various LLMs to see how the answers differ over time? How do we get other librarians to do similar experiments? Government law libraries are only given two week trial periods to Lexis and Westlaw so evaluation is difficult. Right now I’m concentrating my efforts on case summaries in free models. I do have a personal subscription to ChatGPT and the visualizations of the summaries are impressive.

  2. Thank you for running this evaluation! I’d be curious to know whether you tested Westlaw’s latest AI-enabled research product, “Deep Research”? Or was this the older product, AI-Assisted Research?

Leave a Reply

Your email address will not be published. Required fields are marked *