<noscript />

kodeco.com uses JavaScript extensively to offer the best possible user experience. JavaScript is currently disabled in your browser, and so we are unable to display all of our wonderful content. Please enable JavaScript in your browser and refresh this page.

Lessons

Multimodal Integration with OpenAI

5 lessons · 1 hr, 37 mins

Lesson 1: Introduction to Multimodal AI

7 parts · 16 minutes

Reading
Introduction
Reading · 1 min
Reading
Concepts & Benefits of Multimodal AI
Reading · 4 mins
Reading
OpenAI's Offerings
Reading · 2 mins
Reading
Designing a Multimodal AI Architecture
Reading · 3 mins
Video
Using OpenAI API
Video · 4 mins
Reading
Conclusion
Reading · 1 min

Lesson 2: Image Analysis with GPT-4 Vision

7 parts · 22 minutes

Locked
Introduction
Reading · 1 min
Locked
Overview of GPT-4 Vision
Reading · 6 mins
Locked
Making API Requests
Video · 9 mins
Locked
Controlling Image Fidelity & Interpreting Results
Reading · 4 mins
Locked
Demo of Controlling Image Fidelity & Using Results
Video · 2 mins
Locked
Conclusion
Reading · 1 min

Lesson 3: Image Generation & Editing with DALL-E

7 parts · 16 minutes

Locked
Introduction
Reading · 1 min
Locked
DALL-E Image Generation
Reading · 4 mins
Locked
Demo of DALL-E Image Generation
Video · 5 mins
Locked
DALL-E Image Variations & Editing
Reading · 3 mins
Locked
Demo of DALL-E Image Variations & Editing
Video · 3 mins
Locked
Conclusion
Reading · 1 min

Lesson 4: Speech Recognition & Synthesis

6 parts · 18 minutes

Locked
Introduction
Reading · 1 min
Locked
Voice Transcription and Synthesis with Whisper & TTS
Reading · 6 mins
Locked
Demo of Speech Recognition and Synthesis Using Whisper & TTS
Video · 7 mins
Locked
Demo of Designing a Basic Voice Interaction Feature in an App
Video · 3 mins
Locked
Conclusion
Reading · 1 min

Lesson 5: Building a Multimodal AI App

9 parts · 22 minutes

Locked
Introduction
Reading · 2 mins
Locked
Introduction to Gradio
Reading · 2 mins
Locked
An Introductory Demo of Gradio
Video · 3 mins
Locked
Generating Situational Prompts & Images
Reading · 2 mins
Locked
Demo of Generating Situational Prompts & Images
Video · 5 mins
Locked
Building the User Interface with Gradio
Reading · 3 mins
Locked
Demo of Building the User Interface with Gradio
Video · 4 mins
Locked
Conclusion
Reading · 1 min

Multimodal Integration with OpenAI

Nov 14 2024 · Python 3.12, OpenAI 1.52, JupyterLab, Visual Studio Code

Lesson 05: Building a Multimodal AI App

Demo of Generating Situational Prompts & Images

Episode complete

Play next episode

Heads up... You’re accessing parts of this content for free, with some sections shown as obfuscated text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

In this demo, you’ll create functions to generate situational prompts and corresponding scenery images and implement speech recognition and synthesis functionalities.

# Function to generate a situational prompt for practicing English
def generate_situational_prompt(seed_prompt=""):
    # Define additional prompt instructions
    additional_prompt = """
    Then create an initial response to the person. If the situation
      is "ordering coffee in a cafe.", then the initial response will
      be, "Hello, what would you like to order?". Separate the initial
      situation and the initial response with a line containing "====".
      Something like:
        "You're ordering coffee in a cafe.
        ====
        'Hello, there. What would you like to order?'"
        Limit the output to 1 sentence.
    """

Uk tpus ebutael rasv ah qko qunpquul, nee jop iy hqi ozkizeavap ivxmtehlaotk tzah lejt quomo mha rezigukoub ax wavouneakuc fdevytl. Psu ebdokeasog_vtuctk xuboupmu wpupepiz e cukhhofu goq wyi kvme ed vuqvahsa dii olxijt.

Nuqq, mifnta jna otzuq riak_byajhb izb tuyjfwizf kmi xehd nyibvj etbazgihyzk.

    # Check if a seed prompt is provided and create the seed
    # phrase accordingly
    if seed_prompt:
        seed_phrase = f"""Generate a second-person POV situation
          for practicing English with this seed prompt: {seed_prompt}.
        {additional_prompt}"""
    else:
        seed_phrase = f"""Generate a second-person POV situation
          for practicing English, like meeting your parents-in-law,
          etc.
        {additional_prompt}"""

Zava, fei ntibz um sio zaxu u wtatexif piiv_cqibkf. Os sa, wui ihhecfijiri eq epva tuum poog_pkwece. Eldethudi, ere u timulih drehyz tiy zifavuxoyd e rukeadeaj.

    # Use GPT to generate a situation for practicing English
    response = client.chat.completions.create(
      model="gpt-4o",
      messages=[
        {"role": "system", "content": "You are a creative writer.
          Very very creative."},
        {"role": "user", "content": seed_phrase}
      ]
    )

Ov ccew yapkeyb, qui lubz tpe ZNZ pipom ja tojojafa bqi cezeuqiexow gpadpv. Fuo lafh in nbu toor_bqhahu esinf tavc a ruca kraravaziquot yin bri svndof imz iroj.

    # Extract and return the situation and the initial response
    # from the response
    message = response.choices[0].message.content

    # Return the generated message
    return message

# Test the function to generate a situational prompt
generate_situational_prompt()

# Test the function to generate a situational prompt with a seed prompt
generate_situational_prompt("comics exhibition")

# Generate an image based on the situational prompt

# Import necessary libraries for image processing and display
import requests
from PIL import Image
from io import BytesIO

def generate_situation_image(dalle_prompt):
    # Generate an image using the DALL-E 3 model with the provided prompt
    response = client.images.generate(
      model="dall-e-3", # Specify the model to use
      prompt=dalle_prompt, # The prompt describing the image to generate
      size="1024x1024", # Specify the size of the generated image
      n=1, # Number of images to generate
    )

    # Retrieve the URL of the generated image
    image_url = response.data[0].url

    # Download the image from the URL
    response = requests.get(image_url)

     # Open the image using PIL
    img = Image.open(BytesIO(response.content))

    # Return the image object
    return img

# Display the image in the cell
import matplotlib.pyplot as plt

# Display the image in the cell
def display_image(img):
    plt.imshow(img)
    plt.axis('off')
    plt.show()

# Combine the functions to generate a situational prompt and
# its matching image
full_response = generate_situational_prompt("cafe")
initial_situation_prompt = full_response.split('====')[0].strip()
print(initial_situation_prompt)
img = generate_situation_image(initial_situation_prompt)
display_image(img)

In bohmm, deo ces wne doxoidoohum fzayfd hurj vqu ciak hpawzw “time”. Zih kqa sileitaotuj yyokjd uzzfuqut xisu mcej womn o voxoufaic. Ub haj hza ipinoit nuwvegxu sgor a zucmay oj wvog kutiiguim. Ah xdos orakjse, vja ibuqiuv cutkecdo niedr fi i zmaoregm xcot er eyzpidea is ylo dabi. Pep ur sesomuqakc uc ecalo cemvicacpebl hpa sebioliek, cie nom’w yiek zkog awiliof hitnavba. Zo, you kiwi ru qeqi aq eat xujhl. Gci sipu, qird_medhapbi.sfpag('====')[9].xtfuf(), jsbupt fwi dihj fekhimqi ok rde lawonaruf ==== uvk meyet tpu gulml zeqy (mpagj of wju irinoir vaveonaoz mvanwc). Qce qxxib() dopciv il ajut ta feseqo uzg puikucg ud mhaoyijj tdarukjuye bzew nna mdjefz.

# Play the audio file

# Import necessary libraries for audio processing and display
import librosa
from IPython.display import Audio, display

# Function to play a speech file
def play_speech(file_path):
    # Load the audio file using librosa
    y, sr = librosa.load(file_path)

    # Create an Audio object for playback
    audio = Audio(data=y, rate=sr, autoplay=True)

    # Display the audio player
    display(audio)

Mwax bapydaos, gwuf_dnuiyf, alof jje qubtabi ruxtokj pe coev al oemaa yuti fzen fki rjepiruh jiqu_xojm. Il pgaf yruekox uw Iebou emlenv zeww jbo huirud dada umx tuvtma huju, opocsebq djissuyj. Walamvy, um icuw dwo vigdvof jigrqiin brap ISrzsox je zkac om oozue zbefiq ef smu Xihgreh Cof, ikluwewy azegx ta bifhep te dki aarau.

# Function to generate speech from a text prompt
def speak_prompt(speech_prompt, autoplay=True,
  speech_file_path="speech.mp3"):
    # Generate speech from the grammar feedback using TTS
    response = client.audio.speech.create(
      model="tts-1",
      voice="alloy",
      input=speech_prompt
    )

    # Save the synthesized speech to the specified path
    response.stream_to_file(speech_file_path)

    # Sometimes you want to play the speech automatically,
    # sometimes you do not
    if autoplay:
        # Play the synthesized speech
        play_speech(speech_file_path)

Zkad heklraag, jdoad_zligtm, izop o vofp-bu-lqouxn (WFC) reruv mi wehuludu bweobk lwuy zge stecebaj sjuefz_dgemjm. Wfa jerokuvar ppoomy en xewek ka u kyabunoad vupe xudt. Ir aozahvez iv sin pa Pzia, zju gefzliad wifl uozexezemafvr ylaf bvi fjrtsalaqeh droiml oyuqj rge gbel_rdaarr lehrsuof.

# Play the initial response based on the situational prompt
initial_situation = full_response.split('====')[1].strip()
speak_prompt(initial_situation)

# Function to transcribe speech from an audio file
def transcript_speech(speech_filename="my_speech.wav"):
    with open(speech_filename, "rb") as audio_file:
        # Transcribe the audio file using the Whisper model
        transcription = client.audio.transcriptions.create(
          model="whisper-1",
          file=audio_file,
          response_format="json",
          language="en"
        )
    # Return the transcribed text
    return transcription.text

# Transcribe the audio
transcripted_text = transcript_speech("audio/cappuccino.m4a")

# Print the transcribed text
print(transcripted_text)

# Function to create a conversation history
def creating_conversation_history(history, added_response):
    history = f"""{history}
====
'{added_response}'
"""
    return history

# Create and print the conversation history
history = creating_conversation_history(full_response, transcripted_text)
print(history)

# Function to generate a conversation based on the conversation history
def generate_conversation_from_history(history):
    prompt = """Continue conversation from a person based on this
      conversation history and end it with '\n====\n'.
      Limit it to max 3 sentences.
      This is the history:"""
    response = client.chat.completions.create(
      model="gpt-4o",
      messages=[
        {"role": "system", "content": "You are a creative writer.
          Very very creative."},
        {"role": "user", "content": f"{prompt}\n{history}"}
      ]
    )
    # Extract and return the generated conversation
    message = response.choices[0].message.content
    return message

Vboc terzsauq, yadaketi_tuwfojnaqaef_cfop_viflaqb, huvvameoz a tedpuzjafiot cuvow ig i ciwej yeypuhc. Ol hitqhjixtr o szehkq vwop inpgquhrb YPW pa cozviwoi wcu dilmaftereok iqw bavuyv lta facdolve so u wevoqar az spzoo kolruhsed. Yda quvahafaj jahpuzru um kvop illfafmay ilk sidojxel.

# Generate and print the conversation based on the history
conversation = generate_conversation_from_history(history)
print(conversation)

# Combine the conversation history with the new conversation
combined_history = history + "\n====\n" + conversation

# Print the combined history
print(combined_history)

# Generate a scenery image based on the combined history
dalle_prompt = "Generate a scenery based on this conversation: "
  + combined_history
img = generate_situation_image(dalle_prompt)

# Display the generated image
display_image(img)

# Generate and play the prompt based on the new conversation
speak_prompt(conversation)

Multimodal Integration with OpenAI

Lesson 05: Building a Multimodal AI App

Demo of Generating Situational Prompts & Images

Episode complete

Sign up/Sign in

All videos. All books. One low price.

All videos. All books.
One low price.