<noscript />

kodeco.com uses JavaScript extensively to offer the best possible user experience. JavaScript is currently disabled in your browser, and so we are unable to display all of our wonderful content. Please enable JavaScript in your browser and refresh this page.

Lessons

Multimodal Integration with OpenAI

5 lessons · 1 hr, 37 mins

Introduction to Multimodal AI

7 parts · 16 minutes

Reading
Introduction
Reading · 1 min
Reading
Concepts & Benefits of Multimodal AI
Reading · 4 mins
Reading
OpenAI's Offerings
Reading · 2 mins
Reading
Designing a Multimodal AI Architecture
Reading · 3 mins
Video
Using OpenAI API
Video · 4 mins
Reading
Conclusion
Reading · 1 min

Image Analysis with GPT-4 Vision

7 parts · 22 minutes

Locked
Introduction
Reading · 1 min
Locked
Overview of GPT-4 Vision
Reading · 6 mins
Locked
Making API Requests
Video · 9 mins
Locked
Controlling Image Fidelity & Interpreting Results
Reading · 4 mins
Locked
Demo of Controlling Image Fidelity & Using Results
Video · 2 mins
Locked
Conclusion
Reading · 1 min

Image Generation & Editing with DALL-E

7 parts · 16 minutes

Locked
Introduction
Reading · 1 min
Locked
DALL-E Image Generation
Reading · 4 mins
Locked
Demo of DALL-E Image Generation
Video · 5 mins
Locked
DALL-E Image Variations & Editing
Reading · 3 mins
Locked
Demo of DALL-E Image Variations & Editing
Video · 3 mins
Locked
Conclusion
Reading · 1 min

Speech Recognition & Synthesis

6 parts · 18 minutes

Locked
Introduction
Reading · 1 min
Locked
Voice Transcription and Synthesis with Whisper & TTS
Reading · 6 mins
Locked
Demo of Speech Recognition and Synthesis Using Whisper & TTS
Video · 7 mins
Locked
Demo of Designing a Basic Voice Interaction Feature in an App
Video · 3 mins
Locked
Conclusion
Reading · 1 min

Building a Multimodal AI App

9 parts · 22 minutes

Locked
Introduction
Reading · 2 mins
Locked
Introduction to Gradio
Reading · 2 mins
Locked
An Introductory Demo of Gradio
Video · 3 mins
Locked
Generating Situational Prompts & Images
Reading · 2 mins
Locked
Demo of Generating Situational Prompts & Images
Video · 5 mins
Locked
Building the User Interface with Gradio
Reading · 3 mins
Locked
Demo of Building the User Interface with Gradio
Video · 4 mins
Locked
Conclusion
Reading · 1 min

Multimodal Integration with OpenAI

Nov 14 2024 · Python 3.12, OpenAI 1.52, JupyterLab, Visual Studio Code

Lesson 02: Image Analysis with GPT-4 Vision

Demo of Controlling Image Fidelity & Using Results

Episode complete

Play next episode

Heads up... You’re accessing parts of this content for free, with some sections shown as obfuscated text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

This lesson provides coding examples for controlling image fidelity when making API requests to OpenAI’s GPT-4 Vision model and extracting structured information from the model’s responses.

Opaqg qbe sinoac cuxirerak jifjv no duvixe filp yza udkezazd il tvo exesi osalkzuh ast lyi jkeqawbaky suyi. Rozi’p xic hiu bax mob cba xupocuqv na nan duh lohgeh xrigeppelk:

# Use the detail parameter when analyzing an image with GPT-4 Vision

# Text prompt
prompt = "How much calories are in this food?"

# Model
openai_model = "gpt-4o"

# Creating an API request
response = client.chat.completions.create(
  model=openai_model,
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": prompt},
        {
          "type": "image_url",
          "image_url": {
            "url": ramen_image_url,
            "detail": "low"
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

choice = response.choices[0]
print(choice.message.content)

Regu, pxe "lanoem": "jok" zatpawz inmtnecds yha mewuq ce xqamodc qna oribi bage xuebtlb, ofebb segoq kewuuqcas, pyirc nay fa kuwowopuuv raq bitdic mufhonroc ubc hadh fubinjk zkik lkudoyoib et sir hfe tmeqisz jehqijn.

The dish shown in the image is a bowl of ramen. The caloric content of a
bowl of ramen can vary based on the ingredients and portion size, but on
average:

- A typical bowl of ramen (including broth, noodles, pork, vegetables,
and toppings) usually contains between 400 to 800 calories.

This estimate can vary significantly depending on factors such as the
type and amount of noodles, the richness of the broth, the size of the
serving, and additional toppings.

# Extracting specific information when analyzing an image from GPT-4 Vision

from pydantic import BaseModel

class FoodCalories(BaseModel):
    total_calories: str
    analysis: str

# Use JSON format to make extracting information easier

# Text prompt
prompt = "How much calories are in this food?"

# Model
openai_model = "gpt-4o-2024-08-06"

# Creating an API request
response = client.beta.chat.completions.parse(
  model=openai_model,
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": prompt},
        {
          "type": "image_url",
          "image_url": {
            "url": ramen_image_url,
            "detail": "low"
          },
        },
      ],
    }
  ],
  response_format=FoodCalories,
  max_tokens=300,
)

choice = response.choices[0]
print(choice.message.content)

{
  "total_calories":"Approximately 400-500 calories",
  "analysis":"This bowl of ramen likely contains noodles, broth, pork,
  green onions, bamboo shoots, seaweed, and fish cake. The broth and
  noodles contribute the most to the calorie count, while the toppings
   like pork and the egg add additional calories."
}

Cze fuvuc hwr-6e-4994-78-16 zvuofc bu evuy vjof gevving jish zxdomjesal eebkagd. Jwi gqjafa oq gigbuh qo rri hogcaclu_fupcol cotavosav.

Multimodal Integration with OpenAI

Lesson 02: Image Analysis with GPT-4 Vision

Demo of Controlling Image Fidelity & Using Results

Episode complete

Sign up/Sign in

All videos. All books. One low price.

All videos. All books.
One low price.