Exploring OpenAI Whisper for Speech to Text Generation

Hi, this Charu from Classmethod. In this hands-on blog, we'll explore how to implement OpenAI Whisper and integrate it into your projects. We will be focusing on the Speech to Text(STT) feature of Whisper in this blog.

Key Features:

  • It has extremely high quality.
  • It works with 96+ different languages.
  • The best part is that it is completely free to use.
  • Basic Installation

    To begin, we will install 5 different packages. Don't worry, we will go through this step by step and it will not take much time. Also, you might have few of these already installed ;)

    1. Install Python:

    You need to install python in your system. You can do that by following the steps in this link.

    To confirm the installation, type the following command to check the installed version:

    python -V

    2. Install pyTorch:

    Next, you need to install pyTorch through this link. It is a ML library. To be able to run this on your computer, scroll down the provided link and select your configuration. In my case, the configuration looks like this:

    Once, you have made all your selections, copy the provided command and run it in your terminal.

    3. Package Manager:

    Let's download our package manager. If you are using windows, then download Chocolatey and if you are using Mac, then download Homebrew.

    4. FFMPEG

    Now, use your package manager to download a package called ffmpeg.

    If you are using Windows, then download it using the following command:

    choco install ffmpeg

    And if you are using Mac, then use the following command:

    brew install ffmpeg

    5. Whisper

    Now, coming up to the last installation. We will finally install Whisper. To install Whisper, run the following command:

    pip install openai-whisper

    Note: If it does not work for you, try running it in a virtual environment. To do this, run the following commands:

    For Mac:

    python -m venv path/to/venv
    source path/to/venv/bin/activate
    python -m pip install openai-whisper

    For Windows:

    python -m venv path\to\venv
    path\to\venv\Scripts\activate.bat
    python -m pip install openai-whisper

    Explore Whisper

    Congratulations! We have now finished the installation of all the prerequisites.

    It's time to run your code now. Open your favorite code editor and type the following code:

    import whisper
    
    model = whisper.load_model("large")
    result = model.transcribe("Audio_File.wav")
    print(result["text"])

    Make sure to enter the path of your audio file in place of Audio_File.wav. It need not be in a wav format; it can also be a mp3 file.

    Now, when you will run the code, you can view the text generated.

    By default it uses the 'small' model, but I used 'large' model for higher accuracy. The key point is that the larger the model, the greater the time required for processing and the higher the accuracy of the results it produces. You have 5 different models to choose from. To know more about it, go to this link.

  • To run whisper in your terminal, go to the folder which contains the audio file and type:
  • whisper Audio_File.wav

    It will automatically detect the language and generates the text.

  • Also, to include the model or language in the command, use the following flags,
  • whisper Audio_File.wav --language Japanese --model large
  • You can even translate the text into English with the help of following command. Unfortunately, you cannot translate the text into any other language except english.
  • whisper Audio_File.wav --language Japanese --task translate
  • If you want to know all the different flags used by whisper, type:
  • whisper --help

    Conclusion:

    With the basic implementation complete, feel free to experiment with different Whisper models and audio inputs. Explore Whisper's capabilities and iterate on your implementation to suit your specific use case.

    Thank you for reading!

    Happy Learning:)