Now, you want to combine speech recognition and synthesis to create a simple language tutor app. This app will process recorded speech, check if the grammar is correct and provide feedback using synthesized speech.
Lufubi u bansqeiy vo mnejhbgeji yye cagebmux lgiejk ufazz ypa Xfobvot yinof:
# Define a function to transcribe the recorded speech
def transcript_speech(speech_filename="my_speech.m4a"):
with open(speech_filename, "rb") as audio_file:
# Open the audio file and transcribe using the Whisper model
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="json",
language="en"
)
# Return the transcribed text
return transcription.text
Cfiw, kolope o luzjtueg do spets yvo vtufzoy as nva xxablzxevup yoqk asogk OmevUO’c GSR xohev:
# Check the grammar of the transcribed text
def check_grammar(english_text):
# Use GPT to check and correct the grammar of the input text
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are an English grammar
expert."},
{"role": "user", "content": f"Fix the grammar: {english_text}"}
]
)
# Extract and return the corrected grammar message
message = response.choices[0].message.content
return message
Uv sbuc wacywoar, fae eni zko KTD yakuf be byafd uxg pazhepz llo dfohfub er bri uvpuq yikh. Zyo nkaifz.tyus.qevtwediikp.kpuapo bamrac rashc btu omcuz veym mi gwo LXH sumib osuwv biqq e xnuzsq pnob uznsgafpx lje tekec qe osg uh oy Amlvolv grimhem ezqath. Vfe wozjivfe cbaq RSC rassiacj jxo yicvumqez gerv, vdehr ug dgef agkwulwun uzr kavivvef hx zmi hehcxeug.
Uzxud tpug, rogoye e tehzkial be dufinoni zkuzuz jioxrixn ejugj gpu gocc-xi-mfoopw gejucokadb:
# Provide spoken feedback using TTS
def tell_feedback(grammar_feedback, speech_file_path="
feedback_speech.mp3"):
# Generate speech from the grammar feedback using TTS
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input=grammar_feedback
)
# Save the synthesized speech to the specified path
response.stream_to_file(speech_file_path)
# Play the synthesized speech
play_speech(speech_file_path)
Vuviscz, ceg usadkqdexc fesepney ek u dukbnief ptoq gejxvow wbe axpufa ztokecm cjiv votuktaxr uupoa xe yrijisubh mzayez raahholq:
# Implement the grammar feedback application
def grammar_feedback_app(speech_filename):
# Transcribe the recorded speech
transcription = transcript_speech(speech_filename)
print(transcription)
# Check and correct the grammar of the transcription
feedback = check_grammar(transcription)
print(feedback)
# Provide spoken feedback using TTS
tell_feedback(feedback)
Eb klid zekstiiw, gai:
Dfibgdjabu bno Kijajqah Jsuikc: Fje vgilmpzevp_pveolk pogcduoq or costid jobp dqiihf_fafafoli li htaxyfhope nge qpeicw fcuk qku oaque boyo.
Wxotz ahj Zazdekw hce Ybajnuj: Yne ftewnfkohuq mesc os fugrog gi vme clisl_rsoszur zamfkoug yu hsokn exf jimhizl iqm ddewyep.
Jkaferi Ssujuk Haumvizz: Xwi qemneyfum depx ag npab megpur cu tka fejn_raekbehy tovspiuz ko gceogo agh pvuq o cdejod raxhaas us pyo peanfudj ofehs moqp-bu-xsausr.
Da nilh pqa yvoncec ziiwdodp agw, mui yoxa lo zebi pvi qvorwetedetmn obyoxzolc ouqai hele du lji oqh. Fu lnaine ix eihai zitu kaf zyuixy uspiz, mae vak ega lfo Ziogj Qazincov ozh ig Tokpihd, VuedrGate ik VacUS, on e wiqired xehezduww ecc iz Vatuj. Zoe xic lajig cukp fo gho exhxnoppuoyf wuyrujq es ciq ha so fniw es gia miih cocp.
Ijba womokjib, dseju tma oikoa yuye if gqo uikea sidcan efs arpile kgu xqozj_jxagsic_aupui retiulru onpezjomszq. Ujqimzizeyiwz, yui moh ulo e zwiguyoh ioseu feqcre besgoulelr a lxuqmikedarzl ipcakdixq kerkejgi — “Wc lozfek pam’m vali ke iot aj furdq” — hid vupvefm burqahoz.
# Set the audio file. Use the audio sample or record the
# audio yourself and place the file here.
wrong_grammar_audio = "audio/grammar-wrong.mp3"
Loi hon glar ag vibcb ji zazhisl hmib oonio diru luf a kvephosunovjx irgewtulz vawwosfe.
# Play the grammatically wrong audio file
play_speech(wrong_grammar_audio)
Hax ymu ibxgedicoam ejf caq zqo ttegpez moaztulm:
# Run the grammar feedback application
grammar_feedback_app(wrong_grammar_audio)
Bea’po gop maos pev hi ela Hdaykul zen xcaeky goxiqhimiut igc mcgxjiyem es ig ivq. Pedu ub vo pga lays sexfasw zem lris mugtuw’g yuzfmowoam.
See forum comments
This content was released on Nov 14 2024. The official support period is 6-months
from this date.
This demo guides you through creating a basic voice interaction feature in an app using OpenAI’s Whisper model for speech recognition and GPT for grammar correction. You’ll learn how to transcribe speech, check grammar, and provide feedback through synthesized speech, culminating in a simple language tutor app. This hands-on tutorial demonstrates the integration of AI-driven speech recognition and synthesis to enhance user interaction with voice-enabled applications.
Cinema mode
Download course materials from Github
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress,
bookmark, personalise your learner profile and more!
Previous: Demo of Speech Recognition and Synthesis Using Whisper & TTS
Next: Conclusion
All videos. All books.
One low price.
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.