Introducing SportsBuddy

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

Introducing SportsBuddy

In this lesson, you’ll build a RAG app called SportsBuddy. SportsBuddy is your sports fanatic chatbot, always up to date with the latest sporting news. Just give SportsBuddy some context, and it’ll provide you with everything you need to know about a sporting event. Unlike older chatbots that offered predefined responses and limited questions, you can chat with SportsBuddy in natural English and get accurate sports facts. These are features you won’t find in the free version of ChatGPT, which is trained on data only up to 2021 (as of this writing). So why pay for the pro version when you have SportsBuddy? Time to get started.

Setting up an OpenAI Developer Account

To begin, ensure that you have a valid OpenAI API key. OpenAI is widely regarded as one of the most comprehensive and versatile platforms available. Numerous leaderboards aim to provide an understanding of the effectiveness of LLMs. Each leaderboard considers a variety of parameters. Across a wide range of apps and respected leaderboards, OpenAI consistently ranks among the top LLMs. Some of these leaderboards can be found at https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard, https://www.trustbit.tech/en/llm-benchmarks, and https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard. One thing to note is that there’s a lot of healthy competition. Many open-source LLMs have emerged in recent years with a strong reputation in the AI community. Be sure to explore them later.

Retrieving Data for SportsBuddy

There are many ways to feed SportsBuddy with information. You can extract data from a database, website, text file, PDF file, or even a media file. You’ll use Wikipedia for now. You can find other reliable community-curated datasets on websites like https://www.kaggle.com/datasets and https://data.world. Open Jupyter Lab with:

jupyter lab
pip install langchain langchain_community langchain_chroma
pip install -qU langchain-openai
export OPENAI_API_KEY="<insert-your-api-key-here>"
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")
response_message = llm.invoke(
    "What is the cutoff date for your training data?"
)

print(response_message.content)
My training data goes up until October 2021. If you have any questions or
need information based on that timeframe, feel free to ask!
# TODO: Load documents
docs = loader.load()

Storing Retrieved Data in the Database

In the next cell, uncomment the code below #TODO: Split documents to break your data into smaller, manageable chunks:

# TODO: Split documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
# TODO: Store documents in the Chroma vector database
database = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = database.as_retriever()

Building the Prompt

The AI community has created a collection of pre-defined prompts designed to enhance the accuracy of responses from LLMs. Explore these prompts at https://smith.langchain.com/hub/rlm.

You are an assistant for question-answering tasks. Use the following pieces of
  retrieved context to answer the question. If you don't know the answer, 
  just say that you don't know. Use three sentences maximum and keep the 
  answer concise.
Question: {question} 
Context: {context} 
Answer:
rag_chain.invoke("Which programmes were dropped from the 2024 Olympics?")
'Four events were dropped from weightlifting for the 2024 Olympics.
  Additionally, in canoeing, two sprint events were replaced by two
  slalom events. The overall event total for canoeing remained at 16.'

Next Steps

To further explore its capabilities, try another question. Create a new cell and ask:

rag_chain.invoke("Was there a podium sweep in the 2024 Olympics?")
"Yes, there was one podium sweep during the 2024 Olympics. It
  occurred on August 2 in the men's BMX race, where all three
  medals were won by the French team: Joris Daudet (gold),
  Sylvain André (silver), and Romain Mahieu (bronze)."
See forum comments
Download course materials from Github
Previous: Introduction Next: Building a Basic RAG App Demo