Multimodal Integration with OpenAI

Nov 14 2024 · Python 3.12, OpenAI 1.52, JupyterLab, Visual Studio Code

Lesson 02: Image Analysis with GPT-4 Vision

Making API Requests

Episode complete

Play next episode

Next

Heads up... You’re accessing parts of this content for free, with some sections shown as obfuscated text.

Heads up... You’re accessing parts of this content for free, with some sections shown as obfuscated text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

This lesson explores how to use the OpenAI API in your Python projects. The OpenAI API allows you to integrate powerful language models, such as GPT-4, into your applications. You’ll learn the process of making API requests, handling responses, and using structured outputs with the help of Pydantic.

# Load the OpenAI library
from openai import OpenAI

# Set up relevant environment variables
from dotenv import load_dotenv

load_dotenv()

# Create the OpenAI connection object
client = OpenAI()
# Show images in Jupyter Lab

# Import necessary libraries
import requests
from PIL import Image
from io import BytesIO
import matplotlib.pyplot as plt
# Set image URL
ramen_image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/
  e/ec/Shoyu_ramen%2C_at_Kasukabe_Station_%282014.05.05%29_1.jpg/
  1280px-Shoyu_ramen%2C_at_Kasukabe_Station_%282014.05.05%29_1.jpg"

# Fetch the image from the URL
response = requests.get(ramen_image_url)
img = Image.open(BytesIO(response.content))

# Display the image
plt.figure(figsize=(12, 8))
plt.imshow(img)
plt.axis('off')
plt.show()
# Use an image URL when analyzing an image with GPT-4 Vision

# Text prompt
prompt = "How much calories are in this food?"

# Model
openai_model = "gpt-4o"

# Creating an API request
response = client.chat.completions.create(
  model=openai_model,
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": prompt},
        {
          "type": "image_url",
          "image_url": {
            "url": ramen_image_url,
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

choice = response.choices[0]
print(choice)
Choice(finish_reason='stop', index=0, logprobs=None,
  message=ChatCompletionMessage(content="The image
  shows a bowl of ramen, ... and serving sizes.",
  refusal=None, role='assistant', function_call=None,
  tool_calls=None))
# Extract the content
print(choice.message.content)
The image shows a bowl of ramen, which typically includes noodles, broth,
slices of pork, vegetables, and garnishes. The calorie content can vary
significantly based on the specific ingredients and portion sizes. On
average, a typical serving of ramen similar to the one in the image might
contain approximately 400-600 calories. Here is a rough breakdown:

- Noodles: 200-300 calories
- Broth (depending on type and amount): 50-150 calories
- Pork slices: 100-150 calories
- Vegetables and garnishes (scallions, seaweed, narutomaki): 20-50
  calories

Please note that these values are approximate and can vary based on
specific recipes and serving sizes.
# Convert an image to a base64 encoded image
import base64

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

image_path = "images/fried_rice.png"
base64_image = encode_image(image_path)
# Show the image in Jupyter Lab
img = Image.open(image_path)

plt.imshow(img)
plt.axis('off')
plt.show()
Image of fried rice
Ogequ as yvial fela

# Upload the base64 encoded image to the OpenAI API server

# Text prompt
prompt = "How much calories are in this food?"

# Model
openai_model = "gpt-4o"

# Creating an API request
response = client.chat.completions.create(
  model=openai_model,
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": prompt},
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

choice = response.choices[0]
print(choice)
Choice(finish_reason='stop', index=0, logprobs=None,
  message=ChatCompletionMessage(content='It is
  difficult to provide an...calorie content.', refusal=None,
  role='assistant', function_call=None, tool_calls=None))
# Extract the content
print(choice.message.content)
It is difficult to provide an exact calorie count for the food in the
image without knowing the specific quantities and exact ingredients
used. However, I can provide an approximate calorie breakdown based
on typical portions of the ingredients shown:

- **Fried rice (1 cup):** Approximately 200-250 calories
- **Fried egg (1 large):** Approximately 90-100 calories
- **Sausage slices (1 sausage):** Approximately 100-150 calories
- **Lettuce and cucumbers (small amount):** Approximately 10-20
  calories
- **Lime wedges (2 slices):** Approximately 5 calories

Summing these estimates together, the total calorie count is approximately
 405-525 calories for the plate shown. Please note that cooking methods
 and variations in portion size can significantly affect the actual calorie
 content.
# Creating an API request consisting of two images

# Text prompt
prompt = "Which food has less calories?"

# Model
openai_model = "gpt-4o"

# Creating an API request
response = client.chat.completions.create(
  model=openai_model,
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": prompt},
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}",
          },
        },
        {
          "type": "image_url",
          "image_url": {
            "url": ramen_image_url,
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

choice = response.choices[0]
print(choice)
Choice(finish_reason='stop', index=0, logprobs=None,
  message=ChatCompletionMessage(content="It's
  difficult to determine...portion sizes.", refusal=None,
  role='assistant', function_call=None, tool_calls=None))
# Extract the content
print(choice.message.content)
It's difficult to determine the exact calorie count without specific
measurements and detailed nutritional information, but generally
speaking:

1. The dish with rice, fried egg, sausage, and vegetables is likely
to be higher in calories, especially due to the presence of sausages
 and fried egg, which are typically calorie-dense.

2. The bowl of ramen may have fewer overall calories, but this can
 vary widely depending on the ingredients used. Ramen can have a
  high calorie count as well, particularly if it contains fatty
   pork, rich broth, and noodles.

Given the images and typical ingredient usage, the rice dish
(first image) is likely to have a higher calorie count than the
 ramen (second image). However, actual calorie content can vary
  based on the specific preparation methods and portion sizes.
See forum comments
Cinema mode Download course materials from Github
Previous: Overview of GPT-4 Vision Next: Controlling Image Fidelity & Interpreting Results