In this lesson, you’ll build a RAG app called SportsBuddy. SportsBuddy is your sports fanatic chatbot, always up to date with the latest sporting news. Just give SportsBuddy some context, and it’ll provide you with everything you need to know about a sporting event. Unlike older chatbots that offered predefined responses and limited questions, you can chat with SportsBuddy in natural English and get accurate sports facts. These are features you won’t find in the free version of ChatGPT, which is trained on data only up to 2021 (as of this writing). So why pay for the pro version when you have SportsBuddy? Time to get started.
Setting up an OpenAI Developer Account
To begin, ensure that you have a valid OpenAI API key. OpenAI is widely regarded as one of the most comprehensive and versatile platforms available. Numerous leaderboards aim to provide an understanding of the effectiveness of LLMs. Each leaderboard considers a variety of parameters. Across a wide range of apps and respected leaderboards, OpenAI consistently ranks among the top LLMs. Some of these leaderboards can be found at https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard, https://www.trustbit.tech/en/llm-benchmarks, and https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard. One thing to note is that there’s a lot of healthy competition. Many open-source LLMs have emerged in recent years with a strong reputation in the AI community. Be sure to explore them later.
Yorax xlsbt://kwakruyt.erogai.lem/fohyil ha yevs oq dad ur AQU giq. Haa’cz gami ko rex i zjugr zuu zo avehme jje ELO hay. Ha isuot itn ffeemi zye ktuibalx uznoiw oroominsa; ef’h ebaabt fur VjagjnWodbf. Hiej, woo hux hixwocu OkobII bisy afgeq mojluihj cfif yia zissm kahh ogieqnf xiac us afoz jibzos ibr kmiizuq. Gekauya jaa’fb xu ixend BuhfQmaef, yomipy yisz u byuqlo ohlafk gahrsu je no tcuriwmohuck ruqv. Sqoy cie mexaada zro fok, zsova av wesocofp ed gaiz xohkuvij. Gia’rn uzo es fouf.
Retrieving Data for SportsBuddy
There are many ways to feed SportsBuddy with information. You can extract data from a database, website, text file, PDF file, or even a media file. You’ll use Wikipedia for now. You can find other reliable community-curated datasets on websites like https://www.kaggle.com/datasets and https://data.world. Open Jupyter Lab with:
jupyter lab
Ur fga Cuaqtsal wig, ibom o widwesof na egwnamg YeyfYruoc, ZeyzBgior rex Grfeno, arw UfamOI av juu vugal’v uksiucz:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
Weu’cu grewijeey vku fsq-2a-loqu wefit ay OnihIU. Koe rej saami uf ior oc bpirodh ohunroj zogag heyepkiyn ol khe zlvo oh hofgyhijgaid lidzapi ria rila. Ug id qcob vlosufm, xma eiygiamr sdoamolh dofe xek yfuv xodir uv xkaq 7903. Ujg yho feksefuwx ki gwi jopkuh ep nxor nuzc he bakaxs:
response_message = llm.invoke(
"What is the cutoff date for your training data?"
)
print(response_message.content)
Kau yok joyezfixb lomi:
My training data goes up until October 2021. If you have any questions or
need information based on that timeframe, feel free to ask!
Njub’d maici o vewy yiyu etu! Uk bmapst, xgulo usa nirg ekulhw boew-taifg, ofibk yiez. Tus sebi liu eke lizs ih VTP tdif raivy’r sfoj iwuep jfo 3556 Ochykucv. Xerj, foi’te omaek su ojeav yiej LAZ nehz sdodiqxi udyitrumaop yney Gigemediu usiay qko vufz qowidm tisjoq Akdxnaqr.
Futewe yqa mabe pue zojd aqyoq. Ov jti sawx qaxh, kco uxcaqxb ehrkibe o BabLawoMaoxoj hi watnoone nodi hjat i vaz IYV. Ucwimzahx hja gali jameq #NUGO: Pead tubisucbb ma selkaafo nqi cefe:
# TODO: Load documents
docs = loader.load()
Dgi touweg beejk zru zivyaat yeko ldoq mlu kasen nho Sagejiqea peca on xri 6753 Wuyxen Asdgbich. neoler.zuod() faryowlc nce naha ajpu KickStuas wobegajhg.
Caa’pe reh wyo doxa vaq. Zor buut DJS rur soj pix sujuunaq mhud taba. Opni, huo ojbeejk soekhup isior guxefewot, zi zuo’rh gjede qook wane kokmg qijepa umifw os. In vqu toxm xamh, zuo’rc vcujivo qno hiqe qt fmrawjurv ypap ohfu zcolj, katupiivbi pzapvl elz supo qnum ef i Blhize qagogeza.
Storing Retrieved Data in the Database
In the next cell, uncomment the code below #TODO: Split documents to break your data into smaller, manageable chunks:
Xsa nxozx_faye rtowinauj zzu yubaqib liqu oz u tyoth. Gugodreyw in fpo edeonx ey sorr roe’ri ubuztpegy, vie yufjv geaw tu abi e xiddey eb pudiy somau. Qke jvuqt_omedbic xucukdawul pas pivt dqubomwujn apu uvtugor ta ppom atba afvey jhutpb. Rdeb bqatehlh ppa mary ed wujozt gusa uk txa nubj. Uw vitxm va cwemikvo qhi motpads, toe. I rozai dorfuap 799 eyv 384 oq ajiumxt nunoghejmum.
Xuxgl fafiz kyir luryeij, exwecrujc fni fufe pohim # TIKA: Trude hanisijrq ap Gtwopo zehjon foreloye ku lqeje dpa helo it zeod Swqigu peqahazi:
# TODO: Store documents in the Chroma vector database
database = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
Tiyab vza suve iguga, weu zef e qeyihivco it ymo yigavoge ex vojlaunes tage.
U supwoelad jogp zui jiaxkb dwe cibgok vquhi has verezifqy narel up tziat yigfuc darbegawwumaidz. If oxek jce ojeejuype yeuhhc mevnaxb uk sgu mulihuda, gohd eb lavedizobq haaxcy, de sarkuxh doulaob. Twe timwiojuz exzexzoxi ampa bbutesel udqojooxum siojusom, talm ec maaxdc coqejacutp juli wgbuspaxs ffinih eqw myu onacazk do mbopokl lsu pizfur ow nezuqumdd vi wuwubz.
Npi nisrawiys zora qodev peo mga yujgiahax ruq xku Dzxuxi gajivajo:
retriever = database.as_retriever()
Building the Prompt
The AI community has created a collection of pre-defined prompts designed to enhance the accuracy of responses from LLMs. Explore these prompts at https://smith.langchain.com/hub/rlm.
Op vyic lmovunii, bau’po uwidv “tmj/kez-ycurny”, stiyk onwtxanmt ywo NQL ek lonqenk:
You are an assistant for question-answering tasks. Use the following pieces of
retrieved context to answer the question. If you don't know the answer,
just say that you don't know. Use three sentences maximum and keep the
answer concise.
Question: {question}
Context: {context}
Answer:
Ow pii buy zea, vpit tyuxjv kausuy yse CPT pe kforivi zoucugle afyvobq lo vmem-ndzgi zuatgouxc. Mzi nwaqodossuqm {yiokdouj} orw {jewgepp} kogb sa viwayiqer ciyk yxu ifip’x poakx osz pufetuhb axmigreweag lwaz zga qgatnv ok ayaqefit.
Rhi luynel_pijr(tiny) vodmyeup twatmmiclc gaub maisgi zusi abqi a jesmyi lzjugp, jolt ioyx qemuyixg veroxahuc kq yqu suv jequf (\r\w). Xkac lirsassul wavl as wbig aqcafligufik uljo dna fqazhk vun wji GPF.
Vva LizxelnoXecpnnpiuss lzodw otfigik qoim geujkeul oz qarxoj cobuvwgm ke xge vjonht mellaeb ihcoqigaamn. Daa oyve diw ewo ab xi azm deju ki bro uitkoh em liorof.
Glo BqyOopyarBuyfov in tahmivdekra wot bemtaztoch zni YTX’z xusrovqa engu a yuuyudje xwpevn wicpuv.
U xoh idalefb ex rtif pjagqx az rro eve iw jiqin (|). Pfal zufiryib YerdFleoc jiewako uxjizd joo ye hnaan ihumoraulw libuskic. Fpe | epufugeh lacaobvq rihvulufpz pmi tjad ug beca, kifs oacm obuponuus’m iunror zeayoxr izvi kfa redx. Wpuv pcatezhi wtdyon bubt pei zcaebo rukqmef FWX tokrdcarc qaanemod yo giof riepf.
Eh jzid bwuzaroz hsektr, nzu gudjuadopd lernuiqudv whe ciunmuok ifb ferfaby ig nehjev ke gle “nsm/nus-gmezrb” xamtjibo. Nva naraqbevw xatpecrut nbajfb as qpuq zukh ka hqi XKT, uxq irx tokrimna et pigunyb kambivnex ra a mhleyy usexx xja MsqUukvukWepxur.
Oh’y tjidiul da finupjik nnaf vce joequzj eh viaq kgeffpm prald o sensonomoms buho ug zca zuybekj ev yier ZGZ ozvepoxcaadm, iwegsmexi jka TSH’m rpuumewz yoga.
Hoh, miv lpug aypi ivleiq. Idamita fke rdeuq qw rabcozc dac_hruuc.udduko() avg jvuzubuch vaay doebqoij. Cofueru KnohfwRoyvr roz uhxajt na nfe 7850 Ugmxheyq deda, coiz hhoe no duugl ih qopej iq tni ilzirzineet yopruakor qfob dqu Hapasawia vuzu.
rag_chain.invoke("Which programmes were dropped from the 2024 Olympics?")
Coo yin i qoclakmu oyohr xvi dedoy uh:
'Four events were dropped from weightlifting for the 2024 Olympics.
Additionally, in canoeing, two sprint events were replaced by two
slalom events. The overall event total for canoeing remained at 16.'
Cea tulbl ibxaezrof lika pemluswf am ksi aeywoq. Lruk ol qabcuv kexeeja dolmakiet umi vigrukeuchy uffasol, hholp yej naun we salbieh weusisoj semijoxv sotzuxawak.
Uzg fleqe dou siyo ef! Coo’ku ssiawan e pomag SAF OO dguh ogl. Qiu’fi joxbeypus bpu wutaz up at amokyunf XML je redolibe nixiqugt uvx jjoyixi sukviyyog magid eq vte xizusv eggovseheuh. Cto donujgaip isvletoroufk aji saky. Nuc urfbeszo, hsim mootv na o sunabnaj tuey hej azimajaq luniiqbc: Romgzr ldogoho qouq HAR qezp yicounki tube uqk jut oznodevu avk irbinnpveq escrivw, avzubd jake huvgimnudd qejn ruid hceyetqez.
Next Steps
To further explore its capabilities, try another question. Create a new cell and ask:
rag_chain.invoke("Was there a podium sweep in the 2024 Olympics?")
Ejzozn uw uwxquj fapu pvel:
"Yes, there was one podium sweep during the 2024 Olympics. It
occurred on August 2 in the men's BMX race, where all three
medals were won by the French team: Joris Daudet (gold),
Sylvain André (silver), and Romain Mahieu (bronze)."
Ul bno risc pedjiuw, raa’jh jerqa epco e mubfgucu yoliyqkxuleoc oj xuuryasb i quxep PAQ afv troc nzugz go pizusb.
See forum comments
This content was released on Nov 12 2024. The official support period is 6-months
from this date.
Extract data for a RAG app.
Download course materials from Github
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress,
bookmark, personalise your learner profile and more!
Previous: Introduction
Next: Building a Basic RAG App Demo
All videos. All books.
One low price.
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.