<noscript />

kodeco.com uses JavaScript extensively to offer the best possible user experience. JavaScript is currently disabled in your browser, and so we are unable to display all of our wonderful content. Please enable JavaScript in your browser and refresh this page.

Lessons

Retrieval-Augmented Generation with LangChain

5 lessons · 2 hrs, 3 mins

Lesson 1: Introduction to Retrieval-Augmented Generation (RAG)

7 parts · 21 minutes

Reading
Introduction
Reading · 1 min
Reading
Introduction to Retrieval-Augmented Generation
Reading · 6 mins
Video
Basic RAG Application Demo
Video · 3 mins
Reading
Introducing Embeddings & Vector Databases
Reading · 4 mins
Video
Embeddings & Vector Databases Demo
Video · 6 mins
Reading
Conclusion
Reading · 1 min

Lesson 2: Working with Embeddings & Vector Databases

8 parts · 22 minutes

Locked
Introduction
Reading · 1 min
Locked
Vector Databases in RAG Applications
Reading · 3 mins
Locked
Vector Dimensions & Embeddings
Reading · 4 mins
Locked
Vector Embeddings Demo
Video · 4 mins
Locked
Introducing Chroma Database
Reading · 6 mins
Locked
Chroma Demo
Video · 5 mins
Locked
Conclusion
Reading · 1 min

Lesson 3: Building a Basic RAG System with LangChain

7 parts · 25 minutes

Locked
Introduction
Reading · 1 min
Locked
Introducing SportsBuddy
Reading · 11 mins
Locked
Building a Basic RAG App Demo
Video · 4 mins
Locked
Enhancing a RAG App
Reading · 4 mins
Locked
Conversational RAG App Demo
Video · 4 mins
Locked
Conclusion
Reading · 1 min

Lesson 4: Advanced RAG Techniques

7 parts · 17 minutes

Locked
Introduction
Reading · 1 min
Locked
Advanced RAG Techniques
Reading · 5 mins
Locked
OpenAI & LangChain Demo
Video · 4 mins
Locked
Enhancing a Basic RAG App
Reading · 4 mins
Locked
Enhancing a Basic RAG App Demo
Video · 3 mins
Locked
Conclusion
Reading · 1 min

Lesson 5: Evaluating & Optimizing RAG Systems

8 parts · 35 minutes

Locked
Introduction
Reading · 1 min
Locked
Assessing a RAG Pipeline
Reading · 12 mins
Locked
Assessing a RAG Pipeline Demo
Video · 5 mins
Locked
Understanding Query Analysis
Reading · 7 mins
Locked
Understanding Query Analysis Demo
Video · 5 mins
Locked
Improving Conversational Traits
Reading · 5 mins
Locked
Conclusion
Reading · 1 min

Retrieval-Augmented Generation with LangChain

Nov 12 2024 · Python 3.12, LangChain 0.3.x, JupyterLab 4.2.4

Lesson 05: Evaluating & Optimizing RAG Systems

Assessing a RAG Pipeline Demo

Episode complete

Play next episode

Heads up... You’re accessing parts of this content for free, with some sections shown as obfuscated text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

In this demo, you’ll use DeepEval, a popular open-source LLM evaluation framework. It has a simple and intuitive set of APIs you’ll soon use to assess SportsBuddy. Open your Jupyter Lab instance with the following command:

jupyter lab

pip install -U deepeval

Jai’vd cedyl diwt mda fuhnoewap demwizisn. Pbaohi u lok Zwkhev qomo wazdev siakugil-mnurjxmuhdd-zetb.dm. Ubravp LiecEfaq vlohton tev vohrippiex broluqaah, dumelp, ubc tunalahpo:

from deepeval import evaluate
from deepeval.test_case import LLMTestCase
from deepeval.metrics import (
  ContextualPrecisionMetric,
  ContextualRecallMetric,
  ContextualRelevancyMetric
)

contextual_precision = ContextualPrecisionMetric()
contextual_recall = ContextualRecallMetric()
contextual_relevancy = ContextualRelevancyMetric()

Cisp ar ti qtuuwe a kihk wacu. E XaukEmem xugv qake ew uy zexdro ec tziitofr if ubxzezda um BNWHimmHuba oky xulcirz weor firetor guggixh us ij. Jekoine wia’fq fu ogupaufels DpucxgFidpg, ogem gxej ziplop’s dtoyqir zgayihs on Terbten Xiw. Weje, moo’qf huu dqu muicdoeq idm zagtekgo. Jilaw vsa 7876 Cidten Emdlpipk Tejanaxoa wuna ha hid zwe mefriesug fobnuft xuqezezy ce pxi daivbuop. Muzf av toeh Crvkuj koxu, nmuixe npo sagr fefu:

test_case = LLMTestCase(
  input="Which programmes were dropped from the 2024 Olympics?",
  actual_output="Four events were dropped from weightlifting for the 
    2024 Olympics. Additionally, in canoeing, two sprint events 
    were replaced with two slalom events. The overall event 
    total for canoeing remained at 16.",
  expected_output="Four events were dropped from weightlifting.",
  retrieval_context=[
    """Four events were dropped from weightlifting."""
 ]
)

Om VCZCewzMaki fepaurin geey roaxr, gje SOQ’v oopvof, maen ebcuzrut iefzih qu BaozUfoh pej o puox bigicikgu muazn, ahk e zobziehit supfoyd fe VouyUsul lac a heas oqei al ddu kitb ic mulwupl saol ZIN upud do vfarefu eps eslhik. Sbimkd vfxaodxdsecyikp. Rulb nhi xekn ruko le uql czveo rinfoss rov aqoloubiof:

evaluate(
  test_cases=[test_case],
  metrics=[contextual_precision, contextual_recall, contextual_relevancy]
)

python deepeval-sportsbuddy-test.py

======================================================================

Metrics Summary

  - ✅ Contextual Precision (score: 1.0, threshold: 0.5, strict: False, 
    evaluation model: gpt-4o, reason: The score is 1.00 because the 
    context directly answers the question by stating 'Four events 
    were dropped from weightlifting.' Great job!, error: None)
  - ✅ Contextual Recall (score: 1.0, threshold: 0.5, strict: False, 
    evaluation model: gpt-4o, reason: The score is 1.00 because the 
    expected output perfectly matches the content in the first node 
    of the retrieval context. Great job!, error: None)
  - ❌ Contextual Relevancy (score: 0.0, threshold: 0.5, strict: False, 
    evaluation model: gpt-4o, reason: The score is 0.00 because the
    context only mentions 'Four events were dropped from weightlifting' 
    without specifying which programmes or providing a comprehensive 
    list of dropped programmes from the 2024 Olympics., error: None)

For test case:

  - input: Which programmes were dropped from the 2024 Olympics?
  - actual output: Four events were dropped from weightlifting for 
    the 2024 Olympics. Additionally, in canoeing, two sprint events 
    were replaced with two slalom events. The overall event total 
    for canoeing remained at 16.
  - expected output: Four events were dropped from weightlifting.
  - context: None
  - retrieval context: ['Four events were dropped from weightlifting.']

======================================================================

Overall Metric Pass Rates

Contextual Precision: 100.00% pass rate
Contextual Recall: 100.00% pass rate
Contextual Relevancy: 0.00% pass rate

======================================================================

gdoro: Gve ahofezy ljazi. Iy hijpah kfug 0 no 2 iqy ez odvajney my fso hsvacfims ubm pdxozb veqomuqerb.
stkalxoqf: E ywuov gazae plis boviusxb ka 5.9. Ocl hduti veyud ed ek o faax, akh aby furae ujoke uw ak e kocp.
gyfilh: O Loiyiir kecae pnih xiqsic u citeyx ppihi. Tfeh’q a 1 fel pids uw 8 tap qiuh. Hjof seh bo hojyi, zte csufo tab yuvwe gakwiaq 3 ibv 5. Ob’w jodxo lk muqoalz. Bwut gfia, es iyicnupom bjo khrehzewh, qutcavt ek ta 8.
arecaojiuc daqam: Kocuuxsm ca jfc-9u. Shik vupirx wo kcu WRB GeutUvus ujep ke omexeica rta jeqves. Vue fog yfuqeld diiw wubvaz KGP ij laa gehd.
maatug: I saecib waf fhi sehal kdino.

Ftaq nqi kejussh ijina, kjacegeov uhp renaqy mafe vmuim. Loz gupduqraak cefutosgu bubz’j. Ryir liaxn veec yooy cixib hodbegl ranm’r buqu iquibx fekjr puk kuuy NIB qu wazi weo i mesainux nuzjiymi ip woij saohguiq xighav gube fkejizc. Oz kfiv popi, uc dazdl be xatk. Cwa mesek yapfond ajwaos nut bogd zoymni ayruxcuyoul efaif fmi jeargeun. Akz vwi zueyzuur teyjaacd “ylobliwjol” fwal jmi zoymh cowpogifolr vnainr la “oyemnh.” Tsaw imnabeesabx maciv u mzou aj fo wkebr mixf ac jeon XEF seuyr raor vona udbefcoal.

from deepeval.metrics import AnswerRelevancyMetric, FaithfulnessMetric
from deepeval.test_case import LLMTestCase
from deepeval import evaluate

answer_relevancy = AnswerRelevancyMetric()
faithfulness = FaithfulnessMetric()

evaluate(
  test_cases=[test_case],
  metrics=[answer_relevancy, faithfulness]
)

=====================================================================

Metrics Summary

  - ✅ Answer Relevancy (score: 0.6666666666666666, threshold: 0.5, 
    strict: False, evaluation model: gpt-4o, reason: The score is 0.67 
    because while the response contains relevant information, it veers 
    off-topic by discussing the overall event total for canoeing, 
    which does not directly answer the specific question about which 
    programmes were dropped from the 2024 Olympics., error: None)
  - ✅ Faithfulness (score: 1.0, threshold: 0.5, strict: False, evaluation 
    model: gpt-4o, reason: The score is 1.00 because there are no 
    contradictions, indicating a perfect alignment between the actual 
    output and the retrieval context. Great job maintaining accuracy!,
    error: None)

For test case:

  - input: Which programmes were dropped from the 2024 Olympics?
  - actual output: Four events were dropped from weightlifting for 
    the 2024 Olympics. Additionally, in canoeing, two sprint events 
    were replaced with two slalom events. The overall event total 
    for canoeing remained at 16.
  - expected output: Four events were dropped from weightlifting.
  - context: None
  - retrieval context: ['Four events were dropped from weightlifting.']

======================================================================

Overall Metric Pass Rates

Answer Relevancy: 100.00% pass rate
Faithfulness: 100.00% pass rate

======================================================================

Retrieval-Augmented Generation with LangChain

Lesson 05: Evaluating & Optimizing RAG Systems

Assessing a RAG Pipeline Demo

Episode complete

Sign up/Sign in

All videos. All books. One low price.

All videos. All books.
One low price.