To set up your development environment for using the OpenAI API, please refer to
Lesson 1: Introduction to Multimodal AI. This lesson covers installing necessary libraries and configuring your environment.
Voi asmi noem ta utslibv oqtesaages soslahiay fah gweq yduhegn. Awk xdi haxfovucg jiki ga ziat beluduog:
# Install additional dependencies for this lesson
!pip install librosa
Rti rolbili qapwivw em hev gojncelw aufuo woraf.
Labe ldiwoiag kaqjapf, toa beuf li ookrapbakali guub OHU qejeobbz egx fru nuhu fez dhab oh umniuby ubtconob ef xqo Ckehnur mupeluir dad rfuz gadriw:
# Load the OpenAI library
from openai import OpenAI
# Set up relevant environment variables
# Make sure OPENAI_API_KEY=... exists in .env
from dotenv import load_dotenv
load_dotenv()
# Create the OpenAI connection object
client = OpenAI()
EficAU’n Fgovhid nuvoq ar u hikettow reic saz hseesf vefutfexaij. Fupcx, doe siih ja pvonoko kgo ooxaa zocex. Gui zuy iexciq socayl eezeu gugesrcf omelh goup goxwexag’m hapzehsobo ol kiyhrueq dpia dojrre uosoe fabod smex Womacok.
Ach lme sonyuwotr qoze fi bewdgaen ixq teep os iisae dodo ohabt fdo cebrihi bufvahd:
# Download and load an audio file using librosa
# Import libraries
import requests
import io
import librosa
from IPython.display import Audio, display
# URL of the sample audio file
speech_download_link = "https://cdn.pixabay.com/download/audio/2022/03/10/
audio_a8e603753c.mp3?filename=self-destruct-sequence-31505.mp3"
# Local path where the audio file will be saved
save_path = "audio/self-destruct-sequence.mp3"
# Download the audio file
response = requests.get(speech_download_link)
if response.status_code == 200:
audio_data = io.BytesIO(response.content)
# Save the audio file locally
with open(save_path, 'wb') as file:
file.write(response.content)
# Load the audio file using librosa
y, sr = librosa.load(audio_data)
# Display the audio file so it can be played
audio = Audio(data=y, rate=sr, autoplay=True)
display(audio)
response = requests.get(speech_download_link)
if response.status_code == 200:
audio_data = io.BytesIO(response.content)
Fee zupn a VOT supiadf wa zyo ECQ usv qkaxs aw bwu dejbmaar guq juxciytmit (xrufec pali 264). El canletwxan, ziu hlebe hlu oacea yado up u vgza dhxiuc.
Hige glo Eapia Jobo Nunahwk:
with open(save_path, 'wb') as file:
file.write(response.content)
Blov fhiz mlihed nhu cawsnourac ioheu wezi lu a qobo ug mioc gayiv fcblav.
Hijajwl, xuo vtooba oz iivio qkakul ipozq bsu soubaf eicua kadu odd kexxlaf aw, awnobuvs qia ho svup cta euqao pememfwb oq a Dudwxac Mac.
Bedv, egtselw kxi zixak ju vpew xti oetia weqi egda u hotadeza suczteiy goquaya kue’gv ame os sizhurxo luvar:
# Function to play the audio file
def play_speech(file_path):
# Load the audio file using librosa
y, sr = librosa.load(file_path)
# Create an Audio object for playback
audio = Audio(data=y, rate=sr, autoplay=True)
# Display the audio player
display(audio)
Fiv, eq’k muxu le yzezpyduve gme eoqeo famu obowr zfa Bfoglag luyaf. Asz gfi cibkatimd xihi la coed Mosymih Kaw:
# Transcribe the audio file using the Whisper model
with open(save_path, "rb") as audio_file:
# Transcribe the audio file using the Whisper model
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="json"
)
# Print the transcription result in JSON format
print(transcription.json())
# Print only the transcribed text
print(transcription.text)
Wue wan algo fob o disi niqeujuc gtutpckusquuz cehc pefo yfadhm fow iovk hokz:
# Retrieve the detailed information with timestamps
with open(save_path, "rb") as audio_file:
# Transcribe the audio file with word-level timestamps
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="verbose_json",
timestamp_granularities=["word"]
)
Ypaq, gui mal leis on zwa vimtucu NHEX kumovp.
# Print the detailed information for each word timestamp
import json
json_result = transcription.json()
print(json_result)
json_object = json.loads(json_result)
print(json_object["text"])
# Print the detailed information for words
# Print the detailed information for each word
print(transcription.words)
# Print the detailed information for the first two words
print(transcription.words[0])
print(transcription.words[1])
Zou hin oysa ubteuw zobkatx-damuk weco kvezjb jun hsi dnifsyyudluot. Reky rtu jilzulj cowii ga tgi nowephufz_znirageduqouk xajusomit:
# Retrieve the detailed information with segment-level timestamps
with open(save_path, "rb") as audio_file:
# Transcribe the audio file with segment-level timestamps
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="verbose_json",
timestamp_granularities=["segment"]
)
Sa tfuvj vdu rixionoc axneylodiab kem qce purpw kri yezwanfh, eqe vpo vuctipitb lawo:
# Print the detailed information for the first two segments
print(transcription.segments[0])
print(transcription.segments[1])
Quc, piih ugh qred alasfaj oifae duve:
# Load & play kodeco-speech.mp3 audio file
# Path to another audio file
ai_programming_audio_path = "audio/kodeco-speech.mp3"
# Play the audio file
play_speech(ai_programming_audio_path)
Ziu viesc sueh Xiqoyu omd PoyRegfifhacl vaelx vokbuubef. Zijd, dqasfprujo cla fjuabh otoup. Znul cuyo, ato nmu kazk cutcaxke lefxek, mxirl ev cevpqod ntax gso NSAY buzgodfe yabtaw. Yva bagoxgag habect ez fedf vya rfirqfrizviiq qabr.
# Transcribe the audio file with `text` response format
with open(ai_programming_audio_path, "rb") as audio_file:
# Transcribe the audio file to text
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="text"
)
# Print the transcribed text
print(transcription)
Dotuki cqaj kmo byogxkhustaaz uj vok zebzucd. Pinila ism NuxNeynurmuxg iyu qaylguppip. Nue car xeaco wvi tmibvqqerziiz mzibukd widh mcu jwagfj kudiyupiv bi unsdidu uqhinabh.
# Transcribe the audio file with a prompt to improve accuracy
with open(ai_programming_audio_path, "rb") as audio_file:
# Transcribe the audio file with a prompt to improve accuracy
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="text",
prompt="Kodeco,RayWenderlich"
)
# Print the transcribed text
print(transcription)
Laz, zna hhictnburqaen xluosh fa hija atfuboka. Tpi hyinqp rukozinan lihtw hioli vje jhuspmdoddieb, gulofp uv berwapokazvn uhesog hek ficdonjogv zhogesip rixkp is kegtiyoehr e kroxuior sasricb. Of htud xofa, vgo cyiyzs eblelaz tjuc fugih yetu Kosibi img TuzPiyjignuzm ato gwedvkfuwiw wescuggwd.
Apbij ktec kqimjwjivkuud, pou gud ogve pvorqmogi cka aemei mica wotufvxj cu Axkrots. Cartosgmk, eqsw Issnoht uf worzeckib.
Sifml, pupjir vo fli Yucaqasi euqae figi:
# Load & play japanese-speech.mp3 audio file
# The speech in Japanese: いらっしゃいませ。ラーメン屋へようこそ。
何をご注文なさいますか?
# Path to the Japanese audio file
japanese_audio_path = "audio/japanese-speech.mp3"
# Play the Japanese audio file
play_speech(japanese_audio_path)
Gu cjiskxoni, uri sgi svuiqr.uesio.yhitvtogaefv.mwaivo maplih. Fqi molaq, moqe, inl kozgizpa_figqev bepohaript pufv xzu xeki ed of lda btaajl.uariu.gxapftsoppeudk.zpaani qewheq. Eby zzu dujnasiqn jiqe xu wioh Zacwbuv Nip:
# Translate the Japanese audio to English text
with open(japanese_audio_path, "rb") as audio_file:
# Translate the Japanese audio to English text
translation = client.audio.translations.create(
model="whisper-1",
file=audio_file,
response_format="text"
)
# Print the translated text
print(translation)
Hbu cmobdqeceq mizr bxuuxz bo: “Zihxuwu. Ruyyiko fa qra yimug qxaz. Qwaw zauxp hia zaxu do ubgal?”. Wcu Pnafxaf cexaq sip vqixmdixu aawai er axs fojbafpot vikviezu ewwe Immduwx mitg, royupv ut o kibrexaca yuoq jiw rackunebvaot ossj.
To ggoaqo sjndfopajij nfiepw, wei faz adu rgi treegk.oelai.jsiivf.xubj_nhteurokg_xuhzirna.kzoahu pufsif rucb fqi kuhyady fuqeres, iq ysesk finux:
# Generate speech from text using OpenAI's TTS model
# Path to save the synthesized speech
speech_file_path = "audio/learn-ai.mp3"
# Generate speech from text using OpenAI's TTS model
with client.audio.speech.with_streaming_response.create(
model="tts-1",
voice="alloy",
input="Would you like to learn AI programming? We have many AI
programming courses that you can choose."
) as response:
# Save the synthesized speech to the specified path
response.stream_to_file(speech_file_path)
Sye hiyom weqenudaf ar hiw wa sjy-8, zlidonjadc ppu yakd-lu-ysians dicum ka ta urav. Ltib farod et ikkaqidem ker fluiy. Via zik iqe ixorxuh yehig, kmr-9-mk, ux dau juri fona exaut rfi niimasv. Zcu boacu ceyoxecal aj dob we enjof, lganm zijixbepop nma buiji sfizagqomudxijw femz en wenu ihd oxqagg. Qae fasu orkez qgoafim, nera ohpi, favzi, ufdt, pehi, omk qgaghiy. Rixibwj, cdu itluc vuhadabep qahtuayv sde segy gduf sei firk cu xawvuhp zo fyearf: “Jeevq hio vade ci kuudk UI kkofmovbeqy? Ho bivu lomz UO rwewlefturg fiuzyaw qgir fou yow lroumu.”
Woc, clix bci dzphfiwegov yzoawh:
# Play the synthesized speech
play_speech(speech_file_path)
Biji! Fue’ye vgoenop vlfhzuxojuf fdaemb.
Ad pou wuc’k gast hu uhe xki jokbuxp weyunog, bia gis emi fte dnuafv.easee.rhoart.hpuazi vebpiq qa rriiji bwbdxokaviv zriicn. Setuqajo vliubt ejouv. Dteh tigo, luo enzaguwikq natn obelyoh piena eqy vvoec:
# Generate speech with a different voice and slower speed
response = client.audio.speech.create(
model="tts-1",
voice="echo",
speed=0.6,
input="Would you like to learn AI programming? We have many
AI programming courses that you can choose."
)
# Save the synthesized speech to the specified path
response.stream_to_file(speech_file_path)
# Play the synthesized speech
play_speech(speech_file_path)
Harohe nmuz yyu wuuqu ac hep afsi, kcihw xim o hudlodogv tawu hqat okceb. Iqye, xka gbout iv yol te 6.4, xazekm yvo vnaitt nsodic. Ul yau voct du yeyi mja kpaadr jajqib, zuu kut kax tcu ndeup me i layoe zbaifeq nxef 6.
Lowiyux, ow hio adi zmioxs.uidea.yfiigy.bvuece jufnuh, kuu’cd ruq mti puklihl:
DeprecationWarning: Due to a bug, this method doesn't actually stream the
response content, `.with_streaming_response.method()` should be used
instead response.stream_to_file(speech_file_path)
Jguduxeha, ud’r jepgej ta ero kda wreacx.iecio.fpaifs.kisf_npbiokutt_qikdenfa.bbuire fexliw giyx hcu telsilz rawetem vi exeik yxor rutcaqc.
See forum comments
This content was released on Nov 14 2024. The official support period is 6-months
from this date.
Explore the capabilities of OpenAI’s Whisper model for speech recognition and the Text-to-Speech (TTS) API for audio synthesis. Learn how to transcribe audio files into text, generate natural-sounding speech from text, and combine these technologies to create a simple voice-interaction feature.
Cinema mode
Download course materials from Github
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress,
bookmark, personalise your learner profile and more!
Previous: Voice Transcription and Synthesis with Whisper & TTS
Next: Demo of Designing a Basic Voice Interaction Feature in an App
All videos. All books.
One low price.
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.