Multimodal Integration with OpenAI

Nov 14 2024 · Python 3.12, OpenAI 1.52, JupyterLab, Visual Studio Code

Lesson 01: Introduction to Multimodal AI

Using OpenAI API

Episode complete

Play next episode

Next
Transcript

In this demo, you’ll explore how to use the OpenAI API in your Python projects. The OpenAI API allows developers to integrate powerful language models, such as GPT-4, into their apps. You’ll also learn the process of making text generation API requests, handling responses, using structured outputs with the help of Pydantic, and making images generation API requests.

To get started, you’ll need an OpenAI API key. Head over to the OpenAI signup page at platform.openai.com/signup to create an account. If you already have an account, simply log in.

Once you’re signed in, navigate to the “Billing” page. Click the Add to credit balance button and purchase some OpenAI credits. For this course, $10 will be more than enough. After adding credits, you need to generate your API key. For that, go to the API keys page at platform.openai.com/api-keys. Click + Create new secret key. This opens up modal window. Click Create secret key, and you’ll receive your API key.

Once you have your API key, create a file named .env at the root of the project.

After that, add the following line:

OPENAI_API_KEY=your-api-key-here

Make sure to replace your-api-key-here with your actual API key.

Before you can start using the OpenAI API, you must install the necessary libraries and set up your environment.

First, open lesson-1-starter_project.ipynb from the Materials repo under Lesson 01 Starter folder in Jupyter Lab. Next, you must install the OpenAI Python client library, the Pydantic library, the Python dotenv library, the Matplotlib library, the Pillow library, and the requests library.

Add the following code in the first cell of your Jupyter Lab:

# Install dependencies
!pip install openai pydantic python-dotenv matplotlib Pillow requests

Next, to authenticate your requests, update the second cell of your Jupyter Lab with the following code:

# Load the OpenAI library
from openai import OpenAI

# Set up relevant environment variables
from dotenv import load_dotenv

load_dotenv()

# Create the OpenAI connection object
client = OpenAI()

Start by making a simple request to the OpenAI API to check the grammar of a sentence. You’ll use the openai.chat.completions.create function to generate a response from the model.

# Make an API request
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are an English grammar checker."},
        {
            "role": "user",
            "content": "Check the grammar: 'Alice eat an apple every day.'"
        }
    ]
)

# Print the response
print(completion.choices[0].message)

In this example, you sent a text prompt to the model, which then generates a response based on the provided inputs. You would get the following result:

ChatCompletionMessage(content="The correct sentence should be: 'Alice eats
  an apple every day.'", refusal=None, role='assistant', function_call=None,
  tool_calls=None)

Sometimes, you might want to structure the outputs from the OpenAI API into a specific format. This is where Pydantic comes in handy. Pydantic lets you define data models and validate structured data.

Define a simple Pydantic model for a response:

# Import Pydantic
from pydantic import BaseModel

# Define data structure
class GrammarChecking(BaseModel):
    wrong_sentence: str
    correct_sentence: str
    is_correct: bool

# Make an API request
completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are an English grammar checker."},
        {"role": "user", "content": "Check the grammar: 'Alice eat an apple
          every day.'"},
    ],
    response_format=GrammarChecking,
)

# Print the response
print(completion.choices[0].message)

With this setup, you can easily ensure the structure of the data returned by the API.

Run the code and see the result:

ParsedChatCompletionMessage[GrammarChecking](content='{"wrong_sentence":
  "Alice eat an apple every day.","correct_sentence":"Alice eats an apple
  every day.","is_correct":false}', refusal=None, role='assistant',
  function_call=None, tool_calls=[], parsed=GrammarChecking(
  wrong_sentence='Alice eat an apple every day.', correct_sentence=
  'Alice eats an apple every day.', is_correct=False))

With this approach, you can extract the information you need easily because it’s in the JSON format and schema you’ve defined compared with the first request, in which you must parse the raw string.

But you’re hungry for more. You don’t want to deal with text only. You want to explore the multimodal AI that deals with more than text. Why don’t you try to generate an image of a cat?

You can do this by making a request to the multimodal AI API. Here’s how you can do it:

# Import lines
from PIL import Image
from io import BytesIO
import matplotlib.pyplot as plt
import requests

# Creating a multimodal AI API request
response = client.images.generate(
    model="dall-e-3",
    prompt="A cat holding a sign 'Welcome to the Multimodal AI Module'"
)

# Downloading the image
image_url = response.data[0].url
image_response = requests.get(image_url)
img = Image.open(BytesIO(image_response.content))

# Displaying the image
plt.imshow(img)
plt.axis('off')
plt.show()

You would see the following image: A cat holding a sign with “Welcome to the Multimodal AI Module” written on it.

Don’t sweat too much about the code that looks alien to you. In next lessons, you’ll learn how to craft the multimodal AI code.

The multimodal AI is more than just text and images. You’ll also deal with audio later on. So, buckle up and get ready for the next lesson!

See forum comments
Cinema mode Download course materials from Github
Previous: Designing a Multimodal AI Architecture Next: Conclusion