Exploring OpenAI Whisper for Speech to Text Generation

2024.04.12

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

Hi, this Charu from Classmethod. In this hands-on blog, we'll explore how to implement OpenAI Whisper and integrate it into your projects. We will be focusing on the Speech to Text(STT) feature of Whisper in this blog.

Key Features:

It has extremely high quality.

It works with 96+ different languages.

The best part is that it is completely free to use.

Basic Installation

To begin, we will install 5 different packages. Don't worry, we will go through this step by step and it will not take much time. Also, you might have few of these already installed ;)

1. Install Python:

You need to install python in your system. You can do that by following the steps in this link.

To confirm the installation, type the following command to check the installed version:

python -V

2. Install pyTorch:

Next, you need to install pyTorch through this link. It is a ML library. To be able to run this on your computer, scroll down the provided link and select your configuration. In my case, the configuration looks like this:

Once, you have made all your selections, copy the provided command and run it in your terminal.

3. Package Manager:

Let's download our package manager. If you are using windows, then download Chocolatey and if you are using Mac, then download Homebrew.

4. FFMPEG

Now, use your package manager to download a package called ffmpeg.

If you are using Windows, then download it using the following command:

choco install ffmpeg

And if you are using Mac, then use the following command:

brew install ffmpeg

5. Whisper

Now, coming up to the last installation. We will finally install Whisper. To install Whisper, run the following command:

pip install openai-whisper

Note: If it does not work for you, try running it in a virtual environment. To do this, run the following commands:

For Mac:

python -m venv path/to/venv
source path/to/venv/bin/activate
python -m pip install openai-whisper

For Windows:

python -m venv path\to\venv
path\to\venv\Scripts\activate.bat
python -m pip install openai-whisper

Explore Whisper

Congratulations! We have now finished the installation of all the prerequisites.

It's time to run your code now. Open your favorite code editor and type the following code:

import whisper

model = whisper.load_model("large")
result = model.transcribe("Audio_File.wav")
print(result["text"])

Make sure to enter the path of your audio file in place of Audio_File.wav. It need not be in a wav format; it can also be a mp3 file.

Now, when you will run the code, you can view the text generated.

By default it uses the 'small' model, but I used 'large' model for higher accuracy. The key point is that the larger the model, the greater the time required for processing and the higher the accuracy of the results it produces. You have 5 different models to choose from. To know more about it, go to this link.

To run whisper in your terminal, go to the folder which contains the audio file and type:

whisper Audio_File.wav

It will automatically detect the language and generates the text.

Also, to include the model or language in the command, use the following flags,

whisper Audio_File.wav --language Japanese --model large

You can even translate the text into English with the help of following command. Unfortunately, you cannot translate the text into any other language except english.

whisper Audio_File.wav --language Japanese --task translate

If you want to know all the different flags used by whisper, type:

whisper --help

Conclusion:

With the basic implementation complete, feel free to experiment with different Whisper models and audio inputs. Explore Whisper's capabilities and iterate on your implementation to suit your specific use case.

Thank you for reading!

Happy Learning:)

Exploring OpenAI Whisper for Speech to Text Generation

Key Features:

Basic Installation

1. Install Python:

2. Install pyTorch:

3. Package Manager:

4. FFMPEG

5. Whisper

Explore Whisper

Conclusion:

関連記事

AWSで探す

注目のテーマ

プロダクトやサービスで探す

特集やシリーズから探す

EVENTS