Building the User Interface with Gradio

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

In this lesson, you’ll create a multimodal language tutor app using Gradio. The app simulates conversational scenarios, allowing users to practice their English skills interactively. The app displays images, plays audio prompts, and lets users respond via recorded speech. The app then updates the conversation, generates new images, and provides audio feedback based on the user’s input.

App Overview

When the app is launched, it displays an image related to the initial situational context, such as a picture of a cafe. An audio prompt plays, such as “Welcome to Cute Cafe. What would you like to order?” The user can record their response, such as “I would like to have a cup of cafe latte.” The app then updates the conversation, changes the image, and provides a new audio prompt, continuing the dialogue.

Image of the Tutor app in the beginning
Efica as cpu Ricoy abx id bxe hixojbely

Image of the Tutor app after giving audio input
Imipe oh hta Qibow iff uqlay cafuqr oodoa ewjab

Key Components

Here are the key components:

Inputs and Outputs

Here are the inputs and outputs:

Flow of the Program

  1. Initialization:
  2. User Interaction:
  3. Conversation Update:
  4. Visual and Audio Feedback:
  5. Outputs:

State Preservation

The app uses global variables to manage state, ensuring the context of the conversation is maintained across multiple interactions.

See forum comments
Download course materials from Github
Previous: Demo of Generating Situational Prompts & Images Next: Demo of Building the User Interface with Gradio