Skip to content

Whisper 文档地址

whisper 安装准备

安装 Python3.10(以上版本也可以试试)

  • Windows:https://www.python.org/downloads/windows/
  • MacOS:https://www.python.org/downloads/macos/

安装 Rust

bash
pip install setuptools-rust

或者下载.exe 文件并运行:https://forge.rust-lang.org/infra/other-installation-methods.html#rustup

安装 ffmpeg

shell
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

安装 Whisper 模型

shell
pip install -U openai-whisper

Alternatively, the following command will pull and install the latest commit from this repository, along with its Python dependencies:

pip install git+https://github.com/openai/whisper.git

To update the package to the latest version of this repository, please run:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

模型介绍

tiny ,base ,small ,medium ,large ,turbo

SizeParametersEnglish-only modelMultilingual modelRequired VRAMRelative speed
tiny39 Mtiny.entiny~1 GB~10x
base74 Mbase.enbase~1 GB~7x
small244 Msmall.ensmall~2 GB~4x
medium769 Mmedium.enmedium~5 GB~2x
large1550 MN/Alarge~10 GB1x
turbo809 MN/Aturbo~6 GB~8x

The .en models for English-only applications tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models. Additionally, the turbo model is an optimized version of large-v3 that offers faster transcription speed with a minimal degradation in accuracy.

使用命令

安装 whiper 后,如果在 windows 平台,它一般会在:C:\Users\Believer\.cache\whisper下。 使用的时候,可以在任意新建文件夹下放入准备好的音频文件,在它的文件根目录执行以下命令即可。

1.指定模型:The following command will transcribe speech in audio files, using the turbo model:

whisper audio.flac audio.mp3 audio.wav --model turbo

2.指定音频语种:The default setting (which selects the small model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language option:

whisper japanese.wav --language Japanese

3.翻译:Adding --task translate will translate the speech into English:

whisper japanese.wav --language Japanese --task translate

4.帮助:Run the following to view all available options:

whisper --help

补充说明

  • 选择 large 模型中文识别率较高;

  • 用于纯英语应用程序的 .en 【tiny\base\small\medium】 模型往往性能更好

  • 任务执行后会同步生成字幕文件(多种格式);

字幕翻译脚本

whisper 加上 --task translate 参数后,只能把非英文语种翻译为英文语种,无法翻译为其它语种,如果想要得到指定语种的字幕文件,需要自己处理。

以下是一个简单的.srt字幕翻译 demo:

js
const fs = require("fs");
const myTranslateFunc = require('my-translate-func');

const targetLang = 'zh-CN';
const inputFilePath = "english_speech/speech_9.24.srt"; // 输入的.srt文件路径
const outputFilePath = `translated_`${targetLang}`_`${+new Date()}`.srt`; // 输出的.srt文件路径

async function translateSRT(filePath) {
  const srtContent = fs.readFileSync(filePath, "utf8");
  const translatedLines = [];

  const lines = srtContent.split("\n");
  for (let line of lines) {
    if (line && !line.includes("-->")) {
      // 忽略时间轴行
      if (!isNaN(line.trim())) {
        // 检查是否为序号行
        translatedLines.push(line); // 保留序号行
      } else {
        try {
          const res = await myTranslateFunc(line, { to: targetLang }); // 翻译成中文
          translatedLines.push(res.text);
        //   translatedLines.push(`译:${line}`);
        } catch (error) {
          console.error(`Translation error for line: ${line}`, error);
          translatedLines.push(`原:${line}`); // 如果翻译失败,保留原文
        }
      }
    } else {
      translatedLines.push(line); // 保留时间轴行
    }
  }

  return translatedLines.join("\n");
}

async function main() {
  const translatedSRT = await translateSRT(inputFilePath);
  fs.writeFileSync(outputFilePath, translatedSRT);
  console.log(`Translated SRT saved to ${outputFilePath}`);
}

main().catch(console.error);

使用场景

  • 生成会议纪要:主要就是用Whisper提取音频中的文本,然后配合GPT生成最终摘要。
  • 视频字幕:先用剪辑软件提取视频中的音频,然后让Whisper提取音频中的文本,并生成字幕文件(涉及到字幕翻译工作可以用上面的脚本处理)。

Released under the MIT License.