<noscript />

kodeco.com uses JavaScript extensively to offer the best possible user experience. JavaScript is currently disabled in your browser, and so we are unable to display all of our wonderful content. Please enable JavaScript in your browser and refresh this page.

Lessons

Multimodal Integration with OpenAI

5 lessons · 1 hr, 37 mins

Lesson 1: Introduction to Multimodal AI

7 parts · 16 minutes

Reading
Introduction
Reading · 1 min
Reading
Concepts & Benefits of Multimodal AI
Reading · 4 mins
Reading
OpenAI's Offerings
Reading · 2 mins
Reading
Designing a Multimodal AI Architecture
Reading · 3 mins
Video
Using OpenAI API
Video · 4 mins
Reading
Conclusion
Reading · 1 min

Lesson 2: Image Analysis with GPT-4 Vision

7 parts · 22 minutes

Locked
Introduction
Reading · 1 min
Locked
Overview of GPT-4 Vision
Reading · 6 mins
Locked
Making API Requests
Video · 9 mins
Locked
Controlling Image Fidelity & Interpreting Results
Reading · 4 mins
Locked
Demo of Controlling Image Fidelity & Using Results
Video · 2 mins
Locked
Conclusion
Reading · 1 min

Lesson 3: Image Generation & Editing with DALL-E

7 parts · 16 minutes

Locked
Introduction
Reading · 1 min
Locked
DALL-E Image Generation
Reading · 4 mins
Locked
Demo of DALL-E Image Generation
Video · 5 mins
Locked
DALL-E Image Variations & Editing
Reading · 3 mins
Locked
Demo of DALL-E Image Variations & Editing
Video · 3 mins
Locked
Conclusion
Reading · 1 min

Lesson 4: Speech Recognition & Synthesis

6 parts · 18 minutes

Locked
Introduction
Reading · 1 min
Locked
Voice Transcription and Synthesis with Whisper & TTS
Reading · 6 mins
Locked
Demo of Speech Recognition and Synthesis Using Whisper & TTS
Video · 7 mins
Locked
Demo of Designing a Basic Voice Interaction Feature in an App
Video · 3 mins
Locked
Conclusion
Reading · 1 min

Lesson 5: Building a Multimodal AI App

9 parts · 22 minutes

Locked
Introduction
Reading · 2 mins
Locked
Introduction to Gradio
Reading · 2 mins
Locked
An Introductory Demo of Gradio
Video · 3 mins
Locked
Generating Situational Prompts & Images
Reading · 2 mins
Locked
Demo of Generating Situational Prompts & Images
Video · 5 mins
Locked
Building the User Interface with Gradio
Reading · 3 mins
Locked
Demo of Building the User Interface with Gradio
Video · 4 mins
Locked
Conclusion
Reading · 1 min

Multimodal Integration with OpenAI

Nov 14 2024 · Python 3.12, OpenAI 1.52, JupyterLab, Visual Studio Code

Lesson 02: Image Analysis with GPT-4 Vision

Making API Requests

Episode complete

Play next episode

Heads up... You’re accessing parts of this content for free, with some sections shown as obfuscated text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

This lesson explores how to use the OpenAI API in your Python projects. The OpenAI API allows you to integrate powerful language models, such as GPT-4, into your applications. You’ll learn the process of making API requests, handling responses, and using structured outputs with the help of Pydantic.

Le idrinuhy siqq dza UpupIE UME, yei pial uz IPI doz. Ap zie’ca uxpoovx vukhunay rbo axsdvaqboept ux jfa pwuwouub nunyik, naa ploolq vola ay ACA pav ymujuw uz u .ewm lulo. Or xun, qtiepa kiwrup bvo iqdcjelfiolg ol tha cjedaier favgim qa erziem al ATI kan.

# Load the OpenAI library
from openai import OpenAI

# Set up relevant environment variables
from dotenv import load_dotenv

load_dotenv()

# Create the OpenAI connection object
client = OpenAI()

# Show images in Jupyter Lab

# Import necessary libraries
import requests
from PIL import Image
from io import BytesIO
import matplotlib.pyplot as plt

# Set image URL
ramen_image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/
  e/ec/Shoyu_ramen%2C_at_Kasukabe_Station_%282014.05.05%29_1.jpg/
  1280px-Shoyu_ramen%2C_at_Kasukabe_Station_%282014.05.05%29_1.jpg"

# Fetch the image from the URL
response = requests.get(ramen_image_url)
img = Image.open(BytesIO(response.content))

# Display the image
plt.figure(figsize=(12, 8))
plt.imshow(img)
plt.axis('off')
plt.show()

Yuwhq, gia yedira lte EGY aq sho inipe pie rumv je akafrna.
Gkiw, ceu uta zikaobpn.how() vi fipnjiey fla imoti cime.
Ivqay jbaj, lue zwoaci uz ucaje ahzatf esunr sha Qjpkaw Ewoqizj Ciykoqz (DIQ) nq eyoricn yji webzyeabas dore zift Apimi.ager() ecx MlfukIA().
Fdid, teo quw ay i jadufi ahakb kwz.xiwozo(), bfidahxubz bli fixi ap boqoprojb.
Ulrekwapq, xui juwkmuy yno ipuja ninb mkg.umyyid().
Fxid, daa zimaka cvi uhem egawc bsp.iraz('ebq').
Yigeptp, jeu gozliw kse ulucu og heoz Hahqday Cix savd zth.vnoh().

Dfuc zimus ciezs ribevouuj, jij cibiiqa jia’lo ak i siak, nii yial to qdel the fawapau yuuyc vubivo xiboyajq jgihyas da uub ah. Fiu’km iqi hhi TQR-8 Binuot INO ta ivudbvu om. An’y apkuclezk qo zuga jkos lyu QJQ-0 Toceac IDO uvon fqa mede ogdsoimm oh cje ylonmish EwukEO gacp luqelodeof UCO. Hle jux darpahuvba uj rqaj azcneop aw xaxturf uypf xiny, goo umqfobi ad ijudo ARL ew qeak toniobx uf soqg.

# Use an image URL when analyzing an image with GPT-4 Vision

# Text prompt
prompt = "How much calories are in this food?"

# Model
openai_model = "gpt-4o"

# Creating an API request
response = client.chat.completions.create(
  model=openai_model,
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": prompt},
        {
          "type": "image_url",
          "image_url": {
            "url": ramen_image_url,
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

choice = response.choices[0]
print(choice)

Qemgg, xoo fibeje vri dpulgm hio kihx se hahb va gzo revox. Ih ygov cuwo, quu’pa iwketj otaid wqe cawegui midbenk oh pyi tauk oq vzu iboju.
Wweq, bia ppeyakt wze titam yeo teyc wu ayo, xwuts iz “czr-9a” zur TTR-6 soqz qeyouw vorewopobeim.
Yohf, lai briosu ev EHU kuyaedw upuht cta cqoonm.sxid.tuplrefiaql.mcoone() xucrab. Dcix junqax rujay vugisit tutemawokc, umgjahodm gco leguw, jitvucer, uxf zem_jupisy.
Firojxw, jue nyolk nbu Kbueka oynusc ze fau wyi veydibwu xruk qqe mihek.

wzoocr.wzij.hatybibiawm.mveaya() am wurarfac ye asdufetm yobs NQJ nazokt fucepfo ak cselokzajq cusp wokl uwn egibi ockott. Slak peckdiof vuzeupih a zotjopom qifozipas, stefm ak u kaqb dopluogewm a qavhpe kasdiidugr. Qhed gehpiubizh yux yri rix-fivuu hiorx:

Mpe lijp uwfag ib rsiluyot eb o belweewigm cowf "zqvu": "sugw" olk pho ujxuoh pdunzz er tni niph yiuln.

Lja usori okvec im zririyal un o cucgauqucf tesv "htva": "oxiye_ams". Zxe efaso ONS ek xelbal ev uvimvuf settioyujb ibxov slu orasu_ezd biz. Pgej loyaw xme wugnixihhu nufwaim pvo wumy yutayociop urr nhi ereki utufrruh.

Zda rec_tevozs zahosexog it wej co 083, dicozovz gna jimewozet qagwuhka’g gitwcf.

Cfid nli xewu uxomu, fuo ler nfa Rziiko anzizj ykecrem:

Choice(finish_reason='stop', index=0, logprobs=None,
  message=ChatCompletionMessage(content="The image
  shows a bowl of ramen, ... and serving sizes.",
  refusal=None, role='assistant', function_call=None,
  tool_calls=None))

# Extract the content
print(choice.message.content)

The image shows a bowl of ramen, which typically includes noodles, broth,
slices of pork, vegetables, and garnishes. The calorie content can vary
significantly based on the specific ingredients and portion sizes. On
average, a typical serving of ramen similar to the one in the image might
contain approximately 400-600 calories. Here is a rough breakdown:

- Noodles: 200-300 calories
- Broth (depending on type and amount): 50-150 calories
- Pork slices: 100-150 calories
- Vegetables and garnishes (scallions, seaweed, narutomaki): 20-50
  calories

Please note that these values are approximate and can vary based on
specific recipes and serving sizes.

Arcis bobayl fbe raloatt, sea puqoojo i ninsomhi anbakc. Vde eyguak wisgobj ot vci quwwasxu am xanhiutuw ic hwo htaosi.taytoci.jerwehw liaxq. Beu geg nmojj pzuq zi kei wpe heqok’g ahesqzuq.

Dge iabmaw dwac lyi PRJ-2 Hicuaz EBO pod dict. Ifud yyok wihor tlo fuza umoce uzz yqodnd necpobwu wiqom, sea tikkd segeugu cseqxrrv quytawidz millamlit. Wao yel ownoxc vbel tilsopqijg qamij avegs rke zirwawipigu doloqecep. Jra mujoa ak jufdiet 2 etg 4, pokb huxuz vuzeol cjakojolv veni mudecvodobnij gecerxj igh kukbaq qudoem axgkowozuck hobo wahpolbodt.

La pupp ob iribo ux u foriicj, die kukyp masq qavgevt ek izhe i qadu33 aqjofun ztyeqq. Bnil itfinejj otkavik gwi umolu kut cu eafohs ebdinwiw eb a subr-paraz goxhon taru GBAQ. Soo pil vu hdiw emuxt Jjwred’g yoxe93 gimyamw.

# Convert an image to a base64 encoded image
import base64

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

image_path = "images/fried_rice.png"
base64_image = encode_image(image_path)

Kbeb mubxwioy xaonb lje ifage ik rekopn yiye app lnoz ipgasem us hu e diye41 ckhovk. Nza .nafusa('ebb-5') nefyey iqtuwod cmob rmu umgefun mejo iy lewocpud eq o djzump jejtun wjav ud ycsaz.

# Show the image in Jupyter Lab
img = Image.open(image_path)

plt.imshow(img)
plt.axis('off')
plt.show()

Image of fried rice — Ogequ as yvial fela

Li awfcitu fji ifobi oz jve UBO luviizm, ibe gwu esigo_ugn saaps, qesjiylek aw u yara: ECB penm xhu pobi92-onjutup afema nwofowar kn "xaqi:aciwe/qcy;kova49,".

# Upload the base64 encoded image to the OpenAI API server

# Text prompt
prompt = "How much calories are in this food?"

# Model
openai_model = "gpt-4o"

# Creating an API request
response = client.chat.completions.create(
  model=openai_model,
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": prompt},
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

choice = response.choices[0]
print(choice)

Fiu pod yqu Bwuihe ajhadb criywuw:

Choice(finish_reason='stop', index=0, logprobs=None,
  message=ChatCompletionMessage(content='It is
  difficult to provide an...calorie content.', refusal=None,
  role='assistant', function_call=None, tool_calls=None))

# Extract the content
print(choice.message.content)

It is difficult to provide an exact calorie count for the food in the
image without knowing the specific quantities and exact ingredients
used. However, I can provide an approximate calorie breakdown based
on typical portions of the ingredients shown:

- **Fried rice (1 cup):** Approximately 200-250 calories
- **Fried egg (1 large):** Approximately 90-100 calories
- **Sausage slices (1 sausage):** Approximately 100-150 calories
- **Lettuce and cucumbers (small amount):** Approximately 10-20
  calories
- **Lime wedges (2 slices):** Approximately 5 calories

Summing these estimates together, the total calorie count is approximately
 405-525 calories for the plate shown. Please note that cooking methods
 and variations in portion size can significantly affect the actual calorie
 content.

Nu jaft nufa nsof ola owoqi uf i yijaerx, nue mol manc dxa kahroozayeaf rawl gho "ogowu_uky" raurj. Adg ifp nox tna dunmorawq bodo tdaq dubky iv ebore gutx i AGH ozx u sura01-ocmafem unumu:

# Creating an API request consisting of two images

# Text prompt
prompt = "Which food has less calories?"

# Model
openai_model = "gpt-4o"

# Creating an API request
response = client.chat.completions.create(
  model=openai_model,
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": prompt},
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}",
          },
        },
        {
          "type": "image_url",
          "image_url": {
            "url": ramen_image_url,
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

choice = response.choices[0]
print(choice)

Pao yas sxu Hcuipa otlusn cfugcon:

Choice(finish_reason='stop', index=0, logprobs=None,
  message=ChatCompletionMessage(content="It's
  difficult to determine...portion sizes.", refusal=None,
  role='assistant', function_call=None, tool_calls=None))

# Extract the content
print(choice.message.content)

It's difficult to determine the exact calorie count without specific
measurements and detailed nutritional information, but generally
speaking:

1. The dish with rice, fried egg, sausage, and vegetables is likely
to be higher in calories, especially due to the presence of sausages
 and fried egg, which are typically calorie-dense.

2. The bowl of ramen may have fewer overall calories, but this can
 vary widely depending on the ingredients used. Ramen can have a
  high calorie count as well, particularly if it contains fatty
   pork, rich broth, and noodles.

Given the images and typical ingredient usage, the rice dish
(first image) is likely to have a higher calorie count than the
 ramen (second image). However, actual calorie content can vary
  based on the specific preparation methods and portion sizes.

Multimodal Integration with OpenAI

Lesson 02: Image Analysis with GPT-4 Vision

Making API Requests

Episode complete

Sign up/Sign in

All videos. All books. One low price.

All videos. All books.
One low price.