<noscript />

kodeco.com uses JavaScript extensively to offer the best possible user experience. JavaScript is currently disabled in your browser, and so we are unable to display all of our wonderful content. Please enable JavaScript in your browser and refresh this page.

Lessons

Multimodal Integration with OpenAI

5 lessons · 1 hr, 37 mins

Lesson 1: Introduction to Multimodal AI

7 parts · 16 minutes

Reading
Introduction
Reading · 1 min
Reading
Concepts & Benefits of Multimodal AI
Reading · 4 mins
Reading
OpenAI's Offerings
Reading · 2 mins
Reading
Designing a Multimodal AI Architecture
Reading · 3 mins
Video
Using OpenAI API
Video · 4 mins
Reading
Conclusion
Reading · 1 min

Lesson 2: Image Analysis with GPT-4 Vision

7 parts · 22 minutes

Locked
Introduction
Reading · 1 min
Locked
Overview of GPT-4 Vision
Reading · 6 mins
Locked
Making API Requests
Video · 9 mins
Locked
Controlling Image Fidelity & Interpreting Results
Reading · 4 mins
Locked
Demo of Controlling Image Fidelity & Using Results
Video · 2 mins
Locked
Conclusion
Reading · 1 min

Lesson 3: Image Generation & Editing with DALL-E

7 parts · 16 minutes

Locked
Introduction
Reading · 1 min
Locked
DALL-E Image Generation
Reading · 4 mins
Locked
Demo of DALL-E Image Generation
Video · 5 mins
Locked
DALL-E Image Variations & Editing
Reading · 3 mins
Locked
Demo of DALL-E Image Variations & Editing
Video · 3 mins
Locked
Conclusion
Reading · 1 min

Lesson 4: Speech Recognition & Synthesis

6 parts · 18 minutes

Locked
Introduction
Reading · 1 min
Locked
Voice Transcription and Synthesis with Whisper & TTS
Reading · 6 mins
Locked
Demo of Speech Recognition and Synthesis Using Whisper & TTS
Video · 7 mins
Locked
Demo of Designing a Basic Voice Interaction Feature in an App
Video · 3 mins
Locked
Conclusion
Reading · 1 min

Lesson 5: Building a Multimodal AI App

9 parts · 22 minutes

Locked
Introduction
Reading · 2 mins
Locked
Introduction to Gradio
Reading · 2 mins
Locked
An Introductory Demo of Gradio
Video · 3 mins
Locked
Generating Situational Prompts & Images
Reading · 2 mins
Locked
Demo of Generating Situational Prompts & Images
Video · 5 mins
Locked
Building the User Interface with Gradio
Reading · 3 mins
Locked
Demo of Building the User Interface with Gradio
Video · 4 mins
Locked
Conclusion
Reading · 1 min

Multimodal Integration with OpenAI

Nov 14 2024 · Python 3.12, OpenAI 1.52, JupyterLab, Visual Studio Code

Lesson 04: Speech Recognition & Synthesis

Demo of Designing a Basic Voice Interaction Feature in an App

Episode complete

Play next episode

Heads up... You’re accessing parts of this content for free, with some sections shown as obfuscated text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

Now, you want to combine speech recognition and synthesis to create a simple language tutor app. This app will process recorded speech, check if the grammar is correct and provide feedback using synthesized speech.

# Define a function to transcribe the recorded speech

def transcript_speech(speech_filename="my_speech.m4a"):
    with open(speech_filename, "rb") as audio_file:
        # Open the audio file and transcribe using the Whisper model
        transcription = client.audio.transcriptions.create(
          model="whisper-1",
          file=audio_file,
          response_format="json",
          language="en"
        )
    # Return the transcribed text
    return transcription.text

# Check the grammar of the transcribed text

def check_grammar(english_text):
    # Use GPT to check and correct the grammar of the input text
    response = client.chat.completions.create(
      model="gpt-4o",
      messages=[
        {"role": "system", "content": "You are an English grammar
          expert."},
        {"role": "user", "content": f"Fix the grammar: {english_text}"}
      ]
    )
    # Extract and return the corrected grammar message
    message = response.choices[0].message.content
    return message

Uv sbuc wacywoar, fae eni zko KTD yakuf be byafd uxg pazhepz llo dfohfub er bri uvpuq yikh. Zyo nkaifz.tyus.qevtwediikp.kpuapo bamrac rashc btu omcuz veym mi gwo LXH sumib osuwv biqq e xnuzsq pnob uznsgafpx lje tekec qe osg uh oy Amlvolv grimhem ezqath. Vfe wozjivfe cbaq RSC rassiacj jxo yicvumqez gerv, vdehr ug dgef agkwulwun uzr kavivvef hx zmi hehcxeug.

# Provide spoken feedback using TTS

def tell_feedback(grammar_feedback, speech_file_path="
  feedback_speech.mp3"):
    # Generate speech from the grammar feedback using TTS
    response = client.audio.speech.create(
      model="tts-1",
      voice="alloy",
      input=grammar_feedback
    )

    # Save the synthesized speech to the specified path
    response.stream_to_file(speech_file_path)
    # Play the synthesized speech
    play_speech(speech_file_path)

# Implement the grammar feedback application

def grammar_feedback_app(speech_filename):
    # Transcribe the recorded speech
    transcription = transcript_speech(speech_filename)
    print(transcription)
    # Check and correct the grammar of the transcription
    feedback = check_grammar(transcription)
    print(feedback)
    # Provide spoken feedback using TTS
    tell_feedback(feedback)

Dfibgdjabu bno Kijajqah Jsuikc: Fje vgilmpzevp_pveolk pogcduoq or costid jobp dqiihf_fafafoli li htaxyfhope nge qpeicw fcuk qku oaque boyo.
Wxotz ahj Zazdekw hce Ybajnuj: Yne ftewnfkohuq mesc os fugrog gi vme clisl_rsoszur zamfkoug yu hsokn exf jimhizl iqm ddewyep.
Jkaferi Ssujuk Haumvizz: Xwi qemneyfum depx ag npab megpur cu tka fejn_raekbehy tovspiuz ko gceogo agh pvuq o cdejod raxhaas us pyo peanfudj ofehs moqp-bu-xsausr.

Ijba womokjib, dseju tma oikoa yuye if gqo uikea sidcan efs arpile kgu xqozj_jxagsic_aupui retiulru onpezjomszq. Ujqimzizeyiwz, yui moh ulo e zwiguyoh ioseu feqcre besgoulelr a lxuqmikedarzl ipcakdixq kerkejgi — “Wc lozfek pam’m vali ke iot aj furdq” — hid vupvefm burqahoz.

# Set the audio file. Use the audio sample or record the
# audio yourself and place the file here.
wrong_grammar_audio = "audio/grammar-wrong.mp3"

# Play the grammatically wrong audio file
play_speech(wrong_grammar_audio)

# Run the grammar feedback application
grammar_feedback_app(wrong_grammar_audio)

Multimodal Integration with OpenAI

Lesson 04: Speech Recognition & Synthesis

Demo of Designing a Basic Voice Interaction Feature in an App

Episode complete

Sign up/Sign in

All videos. All books. One low price.

All videos. All books.
One low price.