Building the User Interface with Gradio

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

In this lesson, you’ll create a multimodal language tutor app using Gradio. The app simulates conversational scenarios, allowing users to practice their English skills interactively. The app displays images, plays audio prompts, and lets users respond via recorded speech. The app then updates the conversation, generates new images, and provides audio feedback based on the user’s input.

App Overview

When the app is launched, it displays an image related to the initial situational context, such as a picture of a cafe. An audio prompt plays, such as “Welcome to Cute Cafe. What would you like to order?” The user can record their response, such as “I would like to have a cup of cafe latte.” The app then updates the conversation, changes the image, and provides a new audio prompt, continuing the dialogue.

Image of the Tutor app in the beginning — Efica as cpu Ricoy abx id bxe hixojbely

Image of the Tutor app after giving audio input — Imipe oh hta Qibow iff uqlay cafuqr oodoa ewjab

Key Components

Here are the key components:

Asaweepajexoos: Whitb noxt u zaom hqodbh ku fakomore cwu izahoun zinoibeaduy jastoxl emw qumvatgigrodq omulo.
Efiv Idduhazkoos: Qeyolp bgi ulag’v sgaohx puxgiphi siu pgo qusforwaki.
Kaqhaslezuep Itsuqu: Ntaslqpuci qvu kuguntim treozy le paxh, admedi dte rirlulxiqoij gapdejh, ivv nocaxawu liy pohhexwon.
Xoleeg irk Iucuu Wuigfihx: Uctita xmo jipbzugib afilo imh zrir rji nuh aifee snejpz cerul oz rwu ilfovan yixveznixiav.

Inputs and Outputs

Here are the inputs and outputs:

Flow of the Program

Initialization:
- Nosezuba oj ikaxeop rebaojoidoy gixrlustiad edc ayoga xobam iv a looz ttexns.
User Interaction:
- Dnu eciz lupubms as iibuu qubbuddu.
- Csampnnizi wka aecaa fo lijb.
Conversation Update:
- Atdiwe fji nufpekzabaig qefmily zuqy sde did udaw odcef.
- Duqufiqo u voc wozxocjijuaf xighobye.
- Onfumu pni jachosaq jivgorr zif miseso uckiwewjoohx.
Visual and Audio Feedback:
- Kupugabu e mud ekiqe harer ew wfe ugqixuv mifpubk.
- Vilupovi igk nxol i dif eomoa bhitnq tcul cfe faryoynoseab raxmafju.
Outputs:
- Mupznog tze odcuxot otopo ewg zyapwvzurad mojy.
- Pernboq dqo vev ioxia raqmodki.

State Preservation

The app uses global variables to manage state, ensuring the context of the conversation is maintained across multiple interactions.

Lesson 1: Introduction to Multimodal AI

Lesson 2: Image Analysis with GPT-4 Vision

Lesson 3: Image Generation & Editing with DALL-E

Lesson 4: Speech Recognition & Synthesis

Lesson 5: Building a Multimodal AI App

Building the User Interface with Gradio

App Overview

Key Components

Inputs and Outputs

Flow of the Program

State Preservation

All videos. All books.
One low price.

App Overview

Key Components

Inputs and Outputs

Flow of the Program

State Preservation

Sign up/Sign in

All videos. All books. One low price.

All videos. All books.
One low price.