produced by Classmethod ˗ˏˋ 技術とビジネスの祭典 ˎˊ˗ Classmethod ODYSSEY 事前登録受付中

Exploring OpenAI Whisper for Speech to Text Generation

Charu Srivastava

2024.04.12

Hi, this Charu from Classmethod. In this hands-on blog, we'll explore how to implement OpenAI Whisper and integrate it into your projects. We will be focusing on the Speech to Text(STT) feature of Whisper in this blog.

Key Features:

It has extremely high quality.

It works with 96+ different languages.

The best part is that it is completely free to use.

Basic Installation

To begin, we will install 5 different packages. Don't worry, we will go through this step by step and it will not take much time. Also, you might have few of these already installed ;)

1. Install Python:

You need to install python in your system. You can do that by following the steps in this link.

To confirm the installation, type the following command to check the installed version:

python -V

2. Install pyTorch:

Next, you need to install pyTorch through this link. It is a ML library. To be able to run this on your computer, scroll down the provided link and select your configuration. In my case, the configuration looks like this:

Once, you have made all your selections, copy the provided command and run it in your terminal.

3. Package Manager:

Let's download our package manager. If you are using windows, then download Chocolatey and if you are using Mac, then download Homebrew.

4. FFMPEG

Now, use your package manager to download a package called ffmpeg.

If you are using Windows, then download it using the following command:

choco install ffmpeg

And if you are using Mac, then use the following command:

brew install ffmpeg

5. Whisper

Now, coming up to the last installation. We will finally install Whisper. To install Whisper, run the following command:

pip install openai-whisper

Note: If it does not work for you, try running it in a virtual environment. To do this, run the following commands:

For Mac:

python -m venv path/to/venv
source path/to/venv/bin/activate
python -m pip install openai-whisper

For Windows:

python -m venv path\to\venv
path\to\venv\Scripts\activate.bat
python -m pip install openai-whisper

Explore Whisper

Congratulations! We have now finished the installation of all the prerequisites.

It's time to run your code now. Open your favorite code editor and type the following code:

import whisper

model = whisper.load_model("large")
result = model.transcribe("Audio_File.wav")
print(result["text"])

Make sure to enter the path of your audio file in place of Audio_File.wav. It need not be in a wav format; it can also be a mp3 file.

Now, when you will run the code, you can view the text generated.

By default it uses the 'small' model, but I used 'large' model for higher accuracy. The key point is that the larger the model, the greater the time required for processing and the higher the accuracy of the results it produces. You have 5 different models to choose from. To know more about it, go to this link.

To run whisper in your terminal, go to the folder which contains the audio file and type:

whisper Audio_File.wav

It will automatically detect the language and generates the text.

Also, to include the model or language in the command, use the following flags,

whisper Audio_File.wav --language Japanese --model large

You can even translate the text into English with the help of following command. Unfortunately, you cannot translate the text into any other language except english.

whisper Audio_File.wav --language Japanese --task translate

If you want to know all the different flags used by whisper, type:

whisper --help

Conclusion:

With the basic implementation complete, feel free to experiment with different Whisper models and audio inputs. Explore Whisper's capabilities and iterate on your implementation to suit your specific use case.

Thank you for reading!

Happy Learning:)

Exploring OpenAI Whisper for Speech to Text Generation

Key Features:

Basic Installation

1. Install Python:

2. Install pyTorch:

3. Package Manager:

4. FFMPEG

5. Whisper

Explore Whisper

Conclusion:

イベント

EVENT【6/5（水）】QuickSightとTableauのデモで営業分析を解説！アクションに繋げるダッシュボード設計

EVENT【5/29（水）大阪】Alteryxの使い方から導入メリットまですべてがわかるセミナー＆ワークショップ

EVENT【5/15（水）リモート】クラスメソッドの会社説明会を開催します

EVENT【5/8リモート】クラスメソッドのフリーランスエンジニア会社説明会〜フィンテック / リテール業界案件特集〜を開催します

EVENT【5/28（火）】AWSを最大活用するための1dayカンファレンス

EVENT【5/17（金）】認証機能の開発工数削減をデモで体験！次世代認証基盤サービス『Auth0 by Okta』導入実践ウェビナー

EVENT【5/14（火）】アノテーションの中途向けオンライン会社説明会を開催します

EVENT【5/23（木）】ユースケースに学ぶAWS運用のノウハウ～可視化から統制まで～

EVENT【5/16（木）】Snowflakeを触ってみよう！初めての方向けハンズオンセミナー

EVENT【5/10（金）名古屋】クラスメソッドグループの会社説明会を開催します！

Exploring OpenAI Whisper for Speech to Text Generation

Key Features:

Basic Installation

1. Install Python:

2. Install pyTorch:

3. Package Manager:

4. FFMPEG

5. Whisper

Explore Whisper

Conclusion:

イベント

EVENT【6/5（水）】QuickSightとTableauのデモで営業分析を解説！アクションに繋げるダッシュボード設計

EVENT【5/29（水）大阪】Alteryxの使い方から導入メリットまですべてがわかるセミナー＆ワークショップ

EVENT【5/15（水）リモート】クラスメソッドの会社説明会を開催します

EVENT【5/8リモート】クラスメソッドのフリーランスエンジニア会社説明会 〜フィンテック / リテール 業界案件特集〜 を開催します

EVENT【5/28（火）】AWSを最大活用するための1dayカンファレンス

EVENT【5/17（金）】認証機能の開発工数削減をデモで体験！次世代認証基盤サービス『Auth0 by Okta』導入実践ウェビナー

EVENT【5/14（火）】アノテーションの中途向けオンライン会社説明会を開催します

EVENT【5/23（木）】ユースケースに学ぶAWS運用のノウハウ～可視化から統制まで～

EVENT【5/16（木）】Snowflakeを触ってみよう！初めての方向けハンズオンセミナー

EVENT【5/10（金） 名古屋】クラスメソッドグループの会社説明会を開催します！

関連記事

Amazon Bedrockで発話での注文から、商品名と数量を抽出し、商品マスタの商品名と突合してみた[AIチャットボット]

[電話無人対応] Amazon Bedrock + Whisperで住所のヒアリング精度を確認してみた[Amazon Connect]

[電話無人対応] Amazon Bedrock + Whisperで、名前のヒアリング精度を確認してみた[Amazon Connect]

[初心者向け]Amazon Polly を使って Amazon Q を喋らせてみた #AWSreInvent

EVENT【5/8リモート】クラスメソッドのフリーランスエンジニア会社説明会〜フィンテック / リテール業界案件特集〜を開催します

EVENT【5/10（金）名古屋】クラスメソッドグループの会社説明会を開催します！