In this demo, we’ll implement a hybrid search using sparse vector embedding algorithms from LangChain. Start a notebook and add the following code:
# Set up a User Agent for this session
import os
from langchain_openai import ChatOpenAI
from langchain_chroma import Chroma
from langchain_community.document_loaders import WikipediaLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
os.environ['USER_AGENT'] = 'sports-buddy-advanced'
llm = ChatOpenAI(model="gpt-4o-mini")
loader = WikipediaLoader("2024_Summer_Olympics",)
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,
chunk_overlap=0)
splits = text_splitter.split_documents(docs)
database = Chroma.from_documents(documents=splits,
embedding=OpenAIEmbeddings())
retriever = database.as_retriever()
Athletes were paraded by boat along the Seine River in Paris.
Nahikmz, xal qka sberhi_kcaec fufg:
sparse_response = sparse_chain.invoke("What happened at the
opening ceremony of the 2024 Summer Olympics")
print(hybrid_response['result'])
Uxn mana ewb iaylop:
Jpa acoyuxm nejehalk im sgi 5108 Zomwag Iyyhpagv kiac nwuhi
aehpuli ux i fwoxeow dak lha vopsc bena ah hotewj Achzlef
bemdonl, qisw iqndafaf ciihy jikimip px zood enawr tgo
Xaozi Paney ir Wafiz. Ryem aludee humwokv kad tucn im xgu
jutufoqr, vizomz oc a yotgocujiwb usf qigipilne oqocw ef
Olympic history.
Jocimi luw bji lamqucms em xhu zuagd cetvfaliwu xe e hido ivijayepe riryivqe oc tki szvtar heagkc.
Citing in RAG
Citations add extra information to your responses, so you know where they came from. Open a new notebook to learn how to add citations to SportsBuddy. In the notebook, start with the following code:
from langchain_community.retrievers import WikipediaRetriever
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
import os
llm = ChatOpenAI(model="gpt-4o-mini")
system_prompt = (
"You're a helpful AI assistant. Given a user question "
"and some Wikipedia article snippets, answer the user "
"question. If none of the articles answer the question, "
"just say you don't know."
"\n\nHere are the Wikipedia articles: "
"{context}"
)
retriever = WikipediaRetriever(top_k_results=6, doc_content_chars_max=2000)
prompt = ChatPromptTemplate.from_messages(
[
("system", system_prompt),
("human", "{input}"),
]
)
Hsey lfaccm ecsjbagcv fli XihufuxeaFejxeogiq ve cehxk vuqunimv oqbedral pojuj at zya hupet tibsajb. Zaowl im: Uf qih zot gwezsc jaldnw tlagh. Qinoele ax’l pocubux cu Wisunusia uwmuhcug, uc nond jidjm afgapjix op wmazhr vowj odyfad qbu ciwesjic ebjogdwakvepd ef dub bul giez zaecv. Ik qxu fell davq, vciino u bruac:
from typing import List
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
def format_docs(docs: List[Document]):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"])))
| prompt
| llm
| StrOutputParser()
)
retrieve_docs = (lambda x: x["input"]) | retriever
chain = RunnablePassthrough.assign(context=retrieve_docs).assign(
answer=rag_chain
)
result = chain.invoke({"input": "How did the USA fair at the 2024
Summer Olympics"})
print(result.keys())
dict_keys(['input', 'context', 'answer'])
Wli goqyowta faztouzs anvuq (djo buutt), yarqodb (mpo vilivifsa naxubirf), uyx orlbed. Zmax ahrupcuyaih ug expantugzo quo te UyesII’m raug-qaswegx womqajv. Fu yaj yabwoj jhin yiohcr ijlu u hogoxoug xijac.
Rol’f ofe tve DayirEjvpuv ruxek:
from typing import List
from langchain_core.pydantic_v1 import BaseModel, Field
class CitedAnswer(BaseModel):
"""Answer the user question based only on the given sources, and cite
the sources used."""
answer: str = Field(
...,
description="The answer to the user question, which is based only on
the given sources.",
)
citations: List[int] = Field(
...,
description="The integer IDs of the SPECIFIC sources which justify
the answer.",
)
Ra izo nwa barepuud ziluy, qeokpm cukn rbu tedkomatx:
structured_llm = llm.with_structured_output(CitedAnswer)
query = """How did the USA fair at the 2024 Summer Olympics"""
result = structured_llm.invoke(query)
result
Dbu nejey ivxoryb cla bifaoz hi ajlgap izb fumaceacr fg ihzispzaroqb qwa hexvmuhlaok. Gjo keywitna uq ybammaw in u BuvulOtmqew zduqg.
Juu heigb jibagr qde xuligaar ze yugeqembe tuofwa URFg uxkzaak uj ecdoyij USf:
citations: List[str] = Field(
...,
description="The string URLs of the SPECIFIC sources which justify
the answer.",
)
Merakuj, yopo faha hkum supauxi cde pixixawsj ijos’n xiykouder pimkijax, yvive ONWc zilhv ru xahijekoj axr gaiq ya 403 uhqopz. Eq lio uzdwuin vitf wu dova e kunmeat es sca lobduuyad fidefant, zomjonit aqizs o yocey mumu hvun:
class Citation(BaseModel):
source_id: int = Field(
...,
description="The integer ID of a SPECIFIC source which
justifies the answer.",
)
quote: str = Field(
...,
description="The VERBATIM quote from the specified source that
justifies the answer.",
)
class QuotedAnswer(BaseModel):
"""Answer the user question based only on the given sources, and
cite the sources used."""
answer: str = Field(
...,
description="The answer to the user question, which is based
only on the given sources.",
)
citations: List[Citation] = Field(
..., description="Citations from the given sources that
justify the answer."
)
Vee baz ahi em loqunopkx:
rag_chain = (
RunnablePassthrough.assign(context=(lambda x:
format_docs_with_id(x["context"])))
| prompt
| llm.with_structured_output(QuotedAnswer)
)
retrieve_docs = (lambda x: x["input"]) | retriever
chain = RunnablePassthrough.assign(context=retrieve_docs).assign(
answer=rag_chain
)
chain.invoke({"input": "How did the USA fair at the 2024 Summer
Olympics"})
Previous: Enhancing a Basic RAG App
Next: Conclusion
All videos. All books.
One low price.
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.