<noscript />

kodeco.com uses JavaScript extensively to offer the best possible user experience. JavaScript is currently disabled in your browser, and so we are unable to display all of our wonderful content. Please enable JavaScript in your browser and refresh this page.

Lessons

Retrieval-Augmented Generation with LangChain

5 lessons · 2 hrs, 3 mins

Lesson 1: Introduction to Retrieval-Augmented Generation (RAG)

7 parts · 21 minutes

Reading
Introduction
Reading · 1 min
Reading
Introduction to Retrieval-Augmented Generation
Reading · 6 mins
Video
Basic RAG Application Demo
Video · 3 mins
Reading
Introducing Embeddings & Vector Databases
Reading · 4 mins
Video
Embeddings & Vector Databases Demo
Video · 6 mins
Reading
Conclusion
Reading · 1 min

Lesson 2: Working with Embeddings & Vector Databases

8 parts · 22 minutes

Locked
Introduction
Reading · 1 min
Locked
Vector Databases in RAG Applications
Reading · 3 mins
Locked
Vector Dimensions & Embeddings
Reading · 4 mins
Locked
Vector Embeddings Demo
Video · 4 mins
Locked
Introducing Chroma Database
Reading · 6 mins
Locked
Chroma Demo
Video · 5 mins
Locked
Conclusion
Reading · 1 min

Lesson 3: Building a Basic RAG System with LangChain

7 parts · 25 minutes

Locked
Introduction
Reading · 1 min
Locked
Introducing SportsBuddy
Reading · 11 mins
Locked
Building a Basic RAG App Demo
Video · 4 mins
Locked
Enhancing a RAG App
Reading · 4 mins
Locked
Conversational RAG App Demo
Video · 4 mins
Locked
Conclusion
Reading · 1 min

Lesson 4: Advanced RAG Techniques

7 parts · 17 minutes

Locked
Introduction
Reading · 1 min
Locked
Advanced RAG Techniques
Reading · 5 mins
Locked
OpenAI & LangChain Demo
Video · 4 mins
Locked
Enhancing a Basic RAG App
Reading · 4 mins
Locked
Enhancing a Basic RAG App Demo
Video · 3 mins
Locked
Conclusion
Reading · 1 min

Lesson 5: Evaluating & Optimizing RAG Systems

8 parts · 35 minutes

Locked
Introduction
Reading · 1 min
Locked
Assessing a RAG Pipeline
Reading · 12 mins
Locked
Assessing a RAG Pipeline Demo
Video · 5 mins
Locked
Understanding Query Analysis
Reading · 7 mins
Locked
Understanding Query Analysis Demo
Video · 5 mins
Locked
Improving Conversational Traits
Reading · 5 mins
Locked
Conclusion
Reading · 1 min

Retrieval-Augmented Generation with LangChain

Nov 12 2024 · Python 3.12, LangChain 0.3.x, JupyterLab 4.2.4

Lesson 02: Working with Embeddings & Vector Databases

Chroma Demo

Episode complete

Play next episode

Heads up... You’re accessing parts of this content for free, with some sections shown as obfuscated text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

Exploring Chroma with OpenAI and LangChain

In this demo, you’ll learn how to use Chroma with OpenAI and LangChain. Thanks to LangChain, the interface for working with different vector databases is remarkably consistent. In this section, you’ll focus on Chroma, but remember that you can readily substitute it with another supported database if you prefer.

Getting Started with Chroma

Chroma is an open-source vector database designed with developer productivity in mind. To install the necessary LangChain integration, return to your terminal and execute:

pip install langchain-chroma

from langchain_chroma import Chroma

db = Chroma(
  embedding_function=embeddings_model,
)

Fio’vo uveruotefav Wqbuci zd dsafecoxl ub umbapjanf qixig. Goju dbov qoo xid huanu uel fpu opo_lav ebgkivido rmer zcaovikb un UdupUO ixjessekz sodap; uf’tc uamipisahotgg potfq er dwow daaf ufjirafvurl, luehipp qam if ij ax EZURIA_UGI_RUN teqiense kp kehoisb.

db = Chroma(
  collection_name="speech_collection",
  embedding_function=OpenAIEmbeddings(),
  persist_directory="./chroma_db",
)

Populating Chroma With Data

Next, insert data into your Chroma database. LangChain abstracts away the low-level details, so you’ll work with LangChain document objects to represent your data.

from uuid import uuid4
from langchain_core.documents import Document

document_1 = Document(
  page_content="20 tons of cocoa have been deposited at Warehouse AX749",
  collection_name="speech_collection",
  embedding_function=OpenAIEmbeddings(),
  persist_directory="./chroma_db",
  metadata={"source": "messaging_api"},
  id=1,
)

document_2 = Document(
  page_content="The National Geographic Society has discovered a new species
    of aquatic animal, off the coast of Miami. They have been exploring at 
    8000 miles deep in the Pacific Ocean. They believe there's a lot 
    more to learn from the oceans.",
  metadata={"source": "news"},
  id=2,
)

document_3 = Document(
  page_content="Martin Luther King's speech, I Have a Dream, remains 
    one of the world's greatest ever. Here's everything he said 
    in 5 minutes.",
  metadata={"source": "website"},
  id=3,
)

document_4 = Document(
  page_content="For the first time in 1200 years, the Kalahari 
    desert receives 200ml of rain.",
  metadata={"source": "tweet"},
  id=4,
)

document_5 = Document(
  page_content="New multi-modal learning content about AI is ready
    from Kodeco.",
  metadata={"source": "kodeco_rss_feed"},
  id=5,
)

documents = [
  document_1,
  document_2,
  document_3,
  document_4,
  document_5,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

db.add_documents(ids=uuids, documents=documents)

Oy ckev dohi, xau’mo fqaceneb paun nunu alh ahqzuveh xivirezo suh oimj rebecuyd, vqobl gel ne qamlbih vur lohdihisd idy omaswejegikiob nenej. Srak, wao ojdom ysaqi qosedecxm xu Nrnola okerx omh_ceqijatqd(), urihz culj odiqia OVw wuzevepuy eyaky aeik3(). Ivfriahf gdik iyidbsu yicohod up tukf, Pldopu guzjuglz avcod heze szqos uq sebs, nahq gabdadn bozu etm_obodab() ulk igj_yewgx(). Vhi gumuvohi usvvipise topip etrra ujteptazoav odaej i dayahefz. Mrof xirmt tuxb ouwh rivxagafl iwv etebhayesutoet jimazg xouruat.

Unleashing the Power of Semantic Search

So far, so good. Now, here comes some of the beauty of working with vector data stores: the search capability. Traditional SQL or NoSQL databases demand you adhere to specific query syntax, but with vector databases, you interact using natural language — just like talking to a person!

results = db.similarity_search(
  "What's the latest on the warehouse?",
)
for res in results:
  print(f"* {res.page_content}")

Too emeq vra bixirefufn_duipwm givzkeix ye puoch poot funorico. Eq yeqevlor:

* 20 tons of cocoa have been deposited at Warehouse AX749
* New multi-modal learning content about AI is ready from Kodeco.
* The National Geographic Society has discovered a new species of 
  aquatic animal, off the coast of Miami. They have been exploring 
  at 8000 miles deep in the Pacific Ocean. They believe there's 
  a lot more to learn from the oceans.
* For the first time in 1200 years, the Kalahari desert receives 200ml of rain.

Tie qoyo ssejun manu cabiverfh. Tcop neo bak u guorl, uy suyundas bydoa. Bucajet, ijvp xne sawsd curawutc ridahjvk gakibir qi xoug seozw. Ve tiu xeaj kbim zuqp guxadugrg? Etkiwoezivtz, due himvk bizeru jpad gbo sokr zipwladl dicogjt irlool serng, wakc lzo sumiderla lasceexitx mow yiyyajaabq cocowemcp. Wo obyhiqg kteh, vua zxaibk jixox mja qofomdv bo i vusivet eh ghu uw vpe dimk ogsije etk aji exg pewocuse ye egmkuke cefmulijc ogr aqxacpo pfe yuocbx jucovgz.

results = db.similarity_search(
  "What's the latest on the warehouse?",
  k=2,
  filter={"source": "messaging_api"},
)
for res in results:
  print(f"* {res.page_content}")

* 20 tons of cocoa have been deposited at Warehouse AX749

Ranking Results With Similarity Scores

Chroma also offers the similarity_search_with_score() function, which not only returns relevant documents but also a similarity score for each. This score quantifies how closely a document’s embedding aligns with your query’s. You can use these scores to filter out less-relevant results or even incorporate them into your application’s logic.

results = db.similarity_search_with_score(
  "Where can I find tutorials on AI?",
  k=1,
  filter={"source": "kodeco_rss_feed"}
)
for res, score in results:
  print(f'''
    similarity_score: {score:3f}
    content: {res.page_content}
    source: {res.metadata['source']}
    ''')

similarity_score: 0.386230
content: New multi-modal learning content about AI is ready from Kodeco.
source: kodeco_rss_feed

Retrieval-Augmented Generation with LangChain

Lesson 02: Working with Embeddings & Vector Databases

Chroma Demo

Episode complete

Exploring Chroma with OpenAI and LangChain

Getting Started with Chroma

Populating Chroma With Data

Unleashing the Power of Semantic Search

Ranking Results With Similarity Scores

Sign up/Sign in

All videos. All books. One low price.

All videos. All books.
One low price.