YouTube Video Transcription with Whisper

theitguy · June 11, 2026

Hey there, tech enthusiasts! In this Whisper tutorial, we'll dive into the world of audio transcription using Python. We'll be working with the Pytube library to download and convert YouTube video audio into an MP4 file, and then use Whisper to transcribe the audio into text.

First things first, let's install the Pytube library. Open your terminal and run the following command:

pip install pytube

Code:

pip install pytube

Now that Pytube is installed, let's move on to the next step.

Next, we need to import Pytube and provide the link to the YouTube video we want to transcribe. We'll use the following code to convert the audio to MP4:

Import Pytube

Code:

#Importing Pytube library
import pytube

# Reading the YouTube link
video = "https://www.youtube.com/watch?v=x7X9w_GIm1s"
data = pytube.YouTube(video)

# Converting and downloading as 'MP4' file
audio = data.streams.get_audio_only()
audio.download()

The output is a file named like the video title in your current directory. In our case, the file is named "Python in 100 Seconds.mp4".

Now, it's time to convert audio into text using Whisper. We'll start by installing and importing the Whisper library:

pip install whisper

Code:

!pip install git+https://github.com/openai/whisper.git -q

Code:

import whisper

Next, we'll load the model. We'll use the "base" model for this tutorial, but you can find more information about the models here. Each one has tradeoffs between accuracy and speed (compute needed).

Finally, we'll transcribe the audio file using the following code:

transcript

Code:

model = whisper.load_model("base")
text = model.transcribe("Python in 100 Seconds.mp4")

And that's it! We can print out the output:

print(transcript)

Code:

#printing the transcribe
text['text']

YouTube Video Transcription with Whisper

theitguy

Similar threads