- index.ts: plugin entry with definePluginEntry + registerSpeechProvider - speech-provider.ts: full SpeechProviderPlugin implementation - resolveConfig from messages.tts.providers.fish-audio - parseDirectiveToken for voice, model, speed, latency, temperature, top_p - listVoices merging official + user's own voices - synthesize with format-aware output (opus for voice-note, mp3 otherwise) - stub Talk Mode (resolveTalkConfig/resolveTalkOverrides) - tts.ts: raw fishAudioTTS() fetch + listFishAudioVoices() - streaming chunked → buffer, error body included in exceptions - parallel voice listing with graceful partial failure - speech-provider.test.ts: voice ID validation tests - openclaw.plugin.json: speechProviders contract - package.json: peer dep on openclaw >=2026.3.0
98 lines
2.6 KiB
Markdown
98 lines
2.6 KiB
Markdown
# Fish Audio Speech Plugin for OpenClaw
|
||
|
||
A speech provider plugin that integrates [Fish Audio](https://fish.audio) TTS with OpenClaw.
|
||
|
||
## Features
|
||
|
||
- **Fish Audio S2-Pro / S1 / S2** model support
|
||
- **Dynamic voice listing** — your own cloned voices + official Fish Audio voices
|
||
- **Format-aware output** — opus for voice notes (Telegram, WhatsApp), mp3 otherwise
|
||
- **Inline directives** — switch voice, speed, model, and latency mid-message
|
||
- **No core changes required** — standard `SpeechProviderPlugin` extension
|
||
|
||
## Installation
|
||
|
||
```bash
|
||
openclaw plugins install @openclaw/fish-audio-speech
|
||
```
|
||
|
||
## Configuration
|
||
|
||
In your `openclaw.json`:
|
||
|
||
```json
|
||
{
|
||
"messages": {
|
||
"tts": {
|
||
"provider": "fish-audio",
|
||
"providers": {
|
||
"fish-audio": {
|
||
"apiKey": "your-fish-audio-api-key",
|
||
"voiceId": "8a2d42279389471993460b85340235c5",
|
||
"model": "s2-pro",
|
||
"latency": "normal",
|
||
"speed": 1.0
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### Config Options
|
||
|
||
| Field | Type | Default | Description |
|
||
|-------|------|---------|-------------|
|
||
| `apiKey` | string | — | **Required.** Fish Audio API key |
|
||
| `voiceId` | string | `8a2d42...` | Reference ID of the voice to use |
|
||
| `model` | string | `s2-pro` | TTS model (`s2-pro`, `s1`, `s2`) |
|
||
| `latency` | string | `normal` | Latency mode (`normal`, `balanced`, `low`) |
|
||
| `speed` | number | — | Prosody speed (0.5–2.0) |
|
||
| `temperature` | number | — | Sampling temperature (0–1) |
|
||
| `topP` | number | — | Top-p sampling (0–1) |
|
||
| `baseUrl` | string | `https://api.fish.audio` | API base URL |
|
||
|
||
### Environment Variable
|
||
|
||
You can also set the API key via environment variable:
|
||
|
||
```bash
|
||
FISH_AUDIO_API_KEY=your-key
|
||
```
|
||
|
||
## Directives
|
||
|
||
Use inline directives in your messages to control TTS per-message:
|
||
|
||
```
|
||
[[tts:voice=<ref_id>]] Switch voice
|
||
[[tts:speed=1.2]] Prosody speed (0.5–2.0)
|
||
[[tts:model=s1]] Model override
|
||
[[tts:latency=low]] Latency mode
|
||
[[tts:temperature=0.7]] Sampling temperature
|
||
[[tts:top_p=0.8]] Top-p sampling
|
||
```
|
||
|
||
## Voice Listing
|
||
|
||
The plugin dynamically lists available voices via `/tts voices`:
|
||
- **Official Fish Audio voices** (~38 voices)
|
||
- **Your own cloned/trained voices** (marked with "(mine)")
|
||
|
||
## Output Format
|
||
|
||
The plugin automatically selects the best format based on the channel:
|
||
- **Voice note channels** (Telegram, WhatsApp, Matrix, Feishu) → Opus
|
||
- **All other channels** → MP3
|
||
|
||
Both formats set `voiceCompatible: true` — Fish Audio output works cleanly as native voice notes.
|
||
|
||
## Requirements
|
||
|
||
- OpenClaw ≥ 2026.3.0
|
||
- Fish Audio API key ([get one here](https://fish.audio))
|
||
|
||
## License
|
||
|
||
MIT
|