<noscript />

kodeco.com uses JavaScript extensively to offer the best possible user experience. JavaScript is currently disabled in your browser, and so we are unable to display all of our wonderful content. Please enable JavaScript in your browser and refresh this page.

Lessons

Multimodal Integration with OpenAI

5 lessons · 1 hr, 37 mins

Lesson 1: Introduction to Multimodal AI

7 parts · 16 minutes

Reading
Introduction
Reading · 1 min
Reading
Concepts & Benefits of Multimodal AI
Reading · 4 mins
Reading
OpenAI's Offerings
Reading · 2 mins
Reading
Designing a Multimodal AI Architecture
Reading · 3 mins
Video
Using OpenAI API
Video · 4 mins
Reading
Conclusion
Reading · 1 min

Lesson 2: Image Analysis with GPT-4 Vision

7 parts · 22 minutes

Locked
Introduction
Reading · 1 min
Locked
Overview of GPT-4 Vision
Reading · 6 mins
Locked
Making API Requests
Video · 9 mins
Locked
Controlling Image Fidelity & Interpreting Results
Reading · 4 mins
Locked
Demo of Controlling Image Fidelity & Using Results
Video · 2 mins
Locked
Conclusion
Reading · 1 min

Lesson 3: Image Generation & Editing with DALL-E

7 parts · 16 minutes

Locked
Introduction
Reading · 1 min
Locked
DALL-E Image Generation
Reading · 4 mins
Locked
Demo of DALL-E Image Generation
Video · 5 mins
Locked
DALL-E Image Variations & Editing
Reading · 3 mins
Locked
Demo of DALL-E Image Variations & Editing
Video · 3 mins
Locked
Conclusion
Reading · 1 min

Lesson 4: Speech Recognition & Synthesis

6 parts · 18 minutes

Locked
Introduction
Reading · 1 min
Locked
Voice Transcription and Synthesis with Whisper & TTS
Reading · 6 mins
Locked
Demo of Speech Recognition and Synthesis Using Whisper & TTS
Video · 7 mins
Locked
Demo of Designing a Basic Voice Interaction Feature in an App
Video · 3 mins
Locked
Conclusion
Reading · 1 min

Lesson 5: Building a Multimodal AI App

9 parts · 22 minutes

Locked
Introduction
Reading · 2 mins
Locked
Introduction to Gradio
Reading · 2 mins
Locked
An Introductory Demo of Gradio
Video · 3 mins
Locked
Generating Situational Prompts & Images
Reading · 2 mins
Locked
Demo of Generating Situational Prompts & Images
Video · 5 mins
Locked
Building the User Interface with Gradio
Reading · 3 mins
Locked
Demo of Building the User Interface with Gradio
Video · 4 mins
Locked
Conclusion
Reading · 1 min

Multimodal Integration with OpenAI

Nov 14 2024 · Python 3.12, OpenAI 1.52, JupyterLab, Visual Studio Code

Lesson 01: Introduction to Multimodal AI

Using OpenAI API

Episode complete

Play next episode

Transcript

In this demo, you’ll explore how to use the OpenAI API in your Python projects. The OpenAI API allows developers to integrate powerful language models, such as GPT-4, into their apps. You’ll also learn the process of making text generation API requests, handling responses, using structured outputs with the help of Pydantic, and making images generation API requests.

To get started, you’ll need an OpenAI API key. Head over to the OpenAI signup page at platform.openai.com/signup to create an account. If you already have an account, simply log in.

Once you’re signed in, navigate to the “Billing” page. Click the Add to credit balance button and purchase some OpenAI credits. For this course, $10 will be more than enough. After adding credits, you need to generate your API key. For that, go to the API keys page at platform.openai.com/api-keys. Click + Create new secret key. This opens up modal window. Click Create secret key, and you’ll receive your API key.

Once you have your API key, create a file named .env at the root of the project.

After that, add the following line:

OPENAI_API_KEY=your-api-key-here

Make sure to replace your-api-key-here with your actual API key.

Before you can start using the OpenAI API, you must install the necessary libraries and set up your environment.

First, open lesson-1-starter_project.ipynb from the Materials repo under Lesson 01 Starter folder in Jupyter Lab. Next, you must install the OpenAI Python client library, the Pydantic library, the Python dotenv library, the Matplotlib library, the Pillow library, and the requests library.

Add the following code in the first cell of your Jupyter Lab:

# Install dependencies
!pip install openai pydantic python-dotenv matplotlib Pillow requests

Next, to authenticate your requests, update the second cell of your Jupyter Lab with the following code:

# Load the OpenAI library
from openai import OpenAI

# Set up relevant environment variables
from dotenv import load_dotenv

load_dotenv()

# Create the OpenAI connection object
client = OpenAI()

Start by making a simple request to the OpenAI API to check the grammar of a sentence. You’ll use the openai.chat.completions.create function to generate a response from the model.

# Make an API request
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are an English grammar checker."},
        {
            "role": "user",
            "content": "Check the grammar: 'Alice eat an apple every day.'"
        }
    ]
)

# Print the response
print(completion.choices[0].message)

In this example, you sent a text prompt to the model, which then generates a response based on the provided inputs. You would get the following result:

ChatCompletionMessage(content="The correct sentence should be: 'Alice eats
  an apple every day.'", refusal=None, role='assistant', function_call=None,
  tool_calls=None)

Sometimes, you might want to structure the outputs from the OpenAI API into a specific format. This is where Pydantic comes in handy. Pydantic lets you define data models and validate structured data.

Define a simple Pydantic model for a response:

# Import Pydantic
from pydantic import BaseModel

# Define data structure
class GrammarChecking(BaseModel):
    wrong_sentence: str
    correct_sentence: str
    is_correct: bool

# Make an API request
completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are an English grammar checker."},
        {"role": "user", "content": "Check the grammar: 'Alice eat an apple
          every day.'"},
    ],
    response_format=GrammarChecking,
)

# Print the response
print(completion.choices[0].message)

With this setup, you can easily ensure the structure of the data returned by the API.

Run the code and see the result:

ParsedChatCompletionMessage[GrammarChecking](content='{"wrong_sentence":
  "Alice eat an apple every day.","correct_sentence":"Alice eats an apple
  every day.","is_correct":false}', refusal=None, role='assistant',
  function_call=None, tool_calls=[], parsed=GrammarChecking(
  wrong_sentence='Alice eat an apple every day.', correct_sentence=
  'Alice eats an apple every day.', is_correct=False))

With this approach, you can extract the information you need easily because it’s in the JSON format and schema you’ve defined compared with the first request, in which you must parse the raw string.

But you’re hungry for more. You don’t want to deal with text only. You want to explore the multimodal AI that deals with more than text. Why don’t you try to generate an image of a cat?

You can do this by making a request to the multimodal AI API. Here’s how you can do it:

# Import lines
from PIL import Image
from io import BytesIO
import matplotlib.pyplot as plt
import requests

# Creating a multimodal AI API request
response = client.images.generate(
    model="dall-e-3",
    prompt="A cat holding a sign 'Welcome to the Multimodal AI Module'"
)

# Downloading the image
image_url = response.data[0].url
image_response = requests.get(image_url)
img = Image.open(BytesIO(image_response.content))

# Displaying the image
plt.imshow(img)
plt.axis('off')
plt.show()

You would see the following image: A cat holding a sign with “Welcome to the Multimodal AI Module” written on it.

Don’t sweat too much about the code that looks alien to you. In next lessons, you’ll learn how to craft the multimodal AI code.

The multimodal AI is more than just text and images. You’ll also deal with audio later on. So, buckle up and get ready for the next lesson!

Multimodal Integration with OpenAI

Lesson 01: Introduction to Multimodal AI

Using OpenAI API

Episode complete

Sign up/Sign in

All videos. All books. One low price.

All videos. All books.
One low price.