Multimodal Integration with OpenAI

Nov 14 2024 · Python 3.12, OpenAI 1.52, JupyterLab, Visual Studio Code

Lesson 02: Image Analysis with GPT-4 Vision

Demo of Controlling Image Fidelity & Using Results

Episode complete

Play next episode

Next

Heads up... You’re accessing parts of this content for free, with some sections shown as obfuscated text.

Heads up... You’re accessing parts of this content for free, with some sections shown as obfuscated text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

This lesson provides coding examples for controlling image fidelity when making API requests to OpenAI’s GPT-4 Vision model and extracting structured information from the model’s responses.

# Use the detail parameter when analyzing an image with GPT-4 Vision

# Text prompt
prompt = "How much calories are in this food?"

# Model
openai_model = "gpt-4o"

# Creating an API request
response = client.chat.completions.create(
  model=openai_model,
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": prompt},
        {
          "type": "image_url",
          "image_url": {
            "url": ramen_image_url,
            "detail": "low"
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

choice = response.choices[0]
print(choice.message.content)
The dish shown in the image is a bowl of ramen. The caloric content of a
bowl of ramen can vary based on the ingredients and portion size, but on
average:

- A typical bowl of ramen (including broth, noodles, pork, vegetables,
and toppings) usually contains between 400 to 800 calories.

This estimate can vary significantly depending on factors such as the
type and amount of noodles, the richness of the broth, the size of the
serving, and additional toppings.
# Extracting specific information when analyzing an image from GPT-4 Vision

from pydantic import BaseModel

class FoodCalories(BaseModel):
    total_calories: str
    analysis: str

# Use JSON format to make extracting information easier

# Text prompt
prompt = "How much calories are in this food?"

# Model
openai_model = "gpt-4o-2024-08-06"

# Creating an API request
response = client.beta.chat.completions.parse(
  model=openai_model,
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": prompt},
        {
          "type": "image_url",
          "image_url": {
            "url": ramen_image_url,
            "detail": "low"
          },
        },
      ],
    }
  ],
  response_format=FoodCalories,
  max_tokens=300,
)

choice = response.choices[0]
print(choice.message.content)
{
  "total_calories":"Approximately 400-500 calories",
  "analysis":"This bowl of ramen likely contains noodles, broth, pork,
  green onions, bamboo shoots, seaweed, and fish cake. The broth and
  noodles contribute the most to the calorie count, while the toppings
   like pork and the egg add additional calories."
}
See forum comments
Cinema mode Download course materials from Github
Previous: Controlling Image Fidelity & Interpreting Results Next: Conclusion