109 lines
3.1 KiB
Markdown
109 lines
3.1 KiB
Markdown
# Whisper-Powered Subtitle Synchronization
|
|
|
|
**A smart subtitle synchronization tool powered by (OpenAI's) Whisper.**
|
|
|
|
This tool automatically detects and fixes desynchronized subtitles by listening to the audio track of your media. Unlike standard tools that only apply a fixed time shift, this project detects **Non-Linear Drift**, **Framerate Mismatches**, and **Variable Speed** issues, applying an "Elastic" correction map to perfectly align subtitles from start to finish.
|
|
|
|
Designed to work as a standalone CLI tool or a **Bazarr** post-processing script.
|
|
|
|
> [!NOTE]
|
|
> Generative AI has been used during the development of this project.
|
|
---
|
|
|
|
## Installation
|
|
### 1. Prerequisites
|
|
|
|
* **Python 3.9+**
|
|
* **FFmpeg:** Must be installed and accessible in your system PATH.
|
|
* *Linux:* `sudo apt install ffmpeg`
|
|
* *Windows:* Download binaries and add to PATH.
|
|
|
|
|
|
|
|
### 2. Clone & Install
|
|
|
|
```bash
|
|
git clone <url of this repo>
|
|
cd <repo folder>
|
|
|
|
# (Optional) Create a virtual environment
|
|
python -m venv venv
|
|
source venv/bin/activate # or venv\Scripts\activate on Windows
|
|
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
All settings are located in `config.py`. You can tweak these to balance speed vs. accuracy, the most importants being:
|
|
|
|
```python
|
|
SYNC_CONFIG = {
|
|
"device": "cpu", # Use 'cuda' if you have an NVIDIA GPU
|
|
"compute_type": "int8", # Use 'float16' for GPU
|
|
"sample_count": 25, # How many points to check (higher = more accurate curve)
|
|
"scan_duration_sec": 60, # The length of each audio chunk to transcribe (higher = more data, slower)
|
|
"correction_method": "auto" # "auto", "constant", or "force_elastic"
|
|
}
|
|
|
|
```
|
|
|
|
---
|
|
|
|
## How It Works
|
|
|
|
1. **Extract:** The tool extracts small audio chunks (e.g., 60 seconds) at regular intervals (Checkpoints) throughout the media file.
|
|
2. **Transcribe:** It uses Whisper to transcribe the speech in those chunks.
|
|
3. **Match:** It fuzzy-matches the transcribed text against the subtitle file to find the *actual* timestamp vs the *subtitle* timestamp.
|
|
4. **Analyze:**
|
|
- If offsets are stable Apply **Global Offset**.
|
|
- If offsets drift linearly Apply **Linear Regression** (Slope correction).
|
|
- If offsets are chaotic Generate an **Elastic Map** (Piecewise Interpolation).
|
|
|
|
|
|
5. **Apply:** The subtitles are rewritten with the corrected timings.
|
|
|
|
---
|
|
|
|
## Usage
|
|
### Command Line (Manual)
|
|
|
|
You can run the script manually by mimicking the Bazarr argument format:
|
|
|
|
```bash
|
|
python main.py \
|
|
episode="/path/to/movie.mkv" \
|
|
episode_name="My Movie" \
|
|
subtitles="/path/to/subs.srt" \
|
|
episode_language="English" \
|
|
subtitles_language="English"
|
|
|
|
```
|
|
|
|
### Integration with Bazarr
|
|
|
|
> [!CAUTION]
|
|
> Untested yet
|
|
|
|
This tool is designed to be a "Custom Script" in Bazarr.
|
|
|
|
1. Go to **Bazarr > Settings > Subtitles > Post-Processing**.
|
|
2. Enable **"Execute a custom script"**.
|
|
3. **Command:**
|
|
```bash
|
|
python /path/to/script/main.py
|
|
|
|
```
|
|
|
|
|
|
4. **Arguments:**
|
|
```text
|
|
episode="{{episode}}" episode_name="{{episode_name}}" subtitles="{{subtitles}}" episode_language="{{episode_language}}" subtitles_language="{{subtitles_language}}"
|
|
|
|
```
|
|
|
|
*(Note: Bazarr passes these variables automatically).* |