In this session, we will explore performance evaluation techniques for text generation by Large Language Models (LLMs) and Generative AI. This is a nuanced issue and a particular difficulty given the subjective nature of the text generated by LLMs, but also a pressing challenge given the multitude of options now available to businesses in the Generative AI and LLM market. We will cover popular metric scores such as BLEU, ROUGE, and BERTScore, which are widely used to evaluate the quality of text generated by LLMs. Additionally, we will also discuss the LLM-as-a-judge technique, where one LLM is used to evaluate another LLM's generated text. This technique has gained popularity due to its ability to capture more nuanced aspects of text quality, such as coherence and fluency. We'll also go over the current practice of using leaderboards such as the Hugging Face Open Leaderboard to understand the relative quality of LLM performance on various academic benchmarks, especially the LMSys Chat Leaderboard, which uses a variant of the ELO Score to relatively grade the mainstream LLMs available today. By the end of this session, attendees should have a first-level of understanding of the evaluation techniques used to assess the text generation capabilities of LLMs and be able to apply these techniques to their own work.

Webinar Registration

By submitting this form, you consent to our Terms of Use & Privacy Policy and to be contacted by us via Email/Call/Whatsapp/SMS.

Agenda for the session

  • Challenges in LLM evaluation: lack of standards and subjective outputs
  • Metrics to assess quality: BLEU, ROUGE, and BERTScore
  • LLM-as-a-Judge: using LLMs to evaluate others, but it's subjective
  • LLM Leaderboards:Hugging Face and LMSys Chat Leaderboards

About Speakers

Mr. Vinicio De Sola

Senior Data Scientist, Newmark

Vinicio De Sola is a Data Science and AI expert with extensive experience in machine learning and data-driven solutions. Known for his ability to simplify complex concepts, he has mentored countless individuals in their AI journeys. Vinicio combines technical expertise with engaging communication to deliver impactful sessions. Join him to gain practical insights and build a solid foundation in machine learning!