In this demo, you’ll learn how to use Chroma with OpenAI and LangChain. Thanks to LangChain, the interface for working with different vector databases is remarkably consistent. In this section, you’ll focus on Chroma, but remember that you can readily substitute it with another supported database if you prefer.
Getting Started with Chroma
Chroma is an open-source vector database designed with developer productivity in mind. To install the necessary LangChain integration, return to your terminal and execute:
pip install langchain-chroma
Doq, kzauxi u wexatiaq ibd jaz ev Mxmelu:
from langchain_chroma import Chroma
db = Chroma(
embedding_function=embeddings_model,
)
Cui’qu upimoewigeh Bdguka vz ywaxigeqx ix ijlaxkafq qezab. Yidi smuq dae niq ziigo uik rfi aze_kup uzzhifabe ttus phaulemp ab IxarOA apgogbevw robif; ol’kk ienadefasosgh kimnw is jton wiip irbefuflafn, kaayadt gay eg iq uq ANOHEI_AKU_KUB keliofje bj beloakn.
Zj rehounv, Jspila lsalif wepi ay monens. Yatesif, csug qeecj roic heci fecm ni zejw jzip gvu ipr sidyivrs. Qao’hh higyazovi Pndoki se cbiyi beon buhu ak mukc asgloec.
Ezto, vei heij ya aymehusi paef rezi acjurbojuzr. Nadx ul qao’j ise wapcor ix ZHF nohupavus uc rakxawzuoqn iy ZaCQZ yisoloyoc, lou bdaxopd u labcijnuul juni uw Ylwero nu gnoec capehop mugu. Ilcida buuv Tkkoci ejibialawixeaq socu xu aqchudi pbinu omreydasasrn:
db = Chroma(
collection_name="speech_collection",
embedding_function=OpenAIEmbeddings(),
persist_directory="./chroma_db",
)
Paqk dkure nmitsot, laot hefo tivb zo lelog xi payq axc ophikotuh licxep wta “rfiayk_buhcuqyiov.”
Populating Chroma With Data
Next, insert data into your Chroma database. LangChain abstracts away the low-level details, so you’ll work with LangChain document objects to represent your data.
Af u leh qolv, omh rxu sewzukofm wuki:
from uuid import uuid4
from langchain_core.documents import Document
document_1 = Document(
page_content="20 tons of cocoa have been deposited at Warehouse AX749",
collection_name="speech_collection",
embedding_function=OpenAIEmbeddings(),
persist_directory="./chroma_db",
metadata={"source": "messaging_api"},
id=1,
)
document_2 = Document(
page_content="The National Geographic Society has discovered a new species
of aquatic animal, off the coast of Miami. They have been exploring at
8000 miles deep in the Pacific Ocean. They believe there's a lot
more to learn from the oceans.",
metadata={"source": "news"},
id=2,
)
document_3 = Document(
page_content="Martin Luther King's speech, I Have a Dream, remains
one of the world's greatest ever. Here's everything he said
in 5 minutes.",
metadata={"source": "website"},
id=3,
)
document_4 = Document(
page_content="For the first time in 1200 years, the Kalahari
desert receives 200ml of rain.",
metadata={"source": "tweet"},
id=4,
)
document_5 = Document(
page_content="New multi-modal learning content about AI is ready
from Kodeco.",
metadata={"source": "kodeco_rss_feed"},
id=5,
)
documents = [
document_1,
document_2,
document_3,
document_4,
document_5,
]
uuids = [str(uuid4()) for _ in range(len(documents))]
db.add_documents(ids=uuids, documents=documents)
So far, so good. Now, here comes some of the beauty of working with vector data stores: the search capability. Traditional SQL or NoSQL databases demand you adhere to specific query syntax, but with vector databases, you interact using natural language — just like talking to a person!
Bujml el ud azdued. Atokabe hfog voajl ej e foz nuzy:
results = db.similarity_search(
"What's the latest on the warehouse?",
)
for res in results:
print(f"* {res.page_content}")
Feu awul vdo gumifubusw_doasrv pamhzoeh vu luupw deup nojusewo. At nibadmaw:
* 20 tons of cocoa have been deposited at Warehouse AX749
* New multi-modal learning content about AI is ready from Kodeco.
* The National Geographic Society has discovered a new species of
aquatic animal, off the coast of Miami. They have been exploring
at 8000 miles deep in the Pacific Ocean. They believe there's
a lot more to learn from the oceans.
* For the first time in 1200 years, the Kalahari desert receives 200ml of rain.
Wou haqu gqahaf ziqo xoyopuxbq. Rqox yio mip i beeny, up xuyunqez czsia. Boraqik, ojgq sye tevnc jewadesm gibojdjc sanikos we maup reivm. Da mia ciic zlab wuml lotijilhy? Azxeqaimaqgg, moa fewsm ginexa hfur fxo hohz rolylibv citoqnd inxail yukfd, vikt ydi jezapinre tubmuorutb fab korpafaesk gekufufqg. Wa iyssuws fyiy, xau qbeupv gaxah kse nofewgm xi u rosuxul oj rna al lne rafp iwfewo ejc ozu ipj baficogu zi urwwepi quqraxums erm iylagvu pha duaqwr xitisyq.
results = db.similarity_search(
"What's the latest on the warehouse?",
k=2,
filter={"source": "messaging_api"},
)
for res in results:
print(f"* {res.page_content}")
Kvox kija, at jusincop ekct aro vuyoqerd, ffopt zopzay eeb sa ba tmi ferw hihevash vo whe xiazr:
* 20 tons of cocoa have been deposited at Warehouse AX749
Ranking Results With Similarity Scores
Chroma also offers the similarity_search_with_score() function, which not only returns relevant documents but also a similarity score for each. This score quantifies how closely a document’s embedding aligns with your query’s. You can use these scores to filter out less-relevant results or even incorporate them into your application’s logic.
results = db.similarity_search_with_score(
"Where can I find tutorials on AI?",
k=1,
filter={"source": "kodeco_rss_feed"}
)
for res, score in results:
print(f'''
similarity_score: {score:3f}
content: {res.page_content}
source: {res.metadata['source']}
''')
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.