sync: match upstream PR #56891 at rebase 2026-03-30
Brings Gitea mirror up to date with the current state of the openclaw/openclaw PR branch, including all fixes from Codex review: - Namespaced directive keys (fishaudio_*/fish_* prefixes only) - Strict latency directive validation with warnings - Code quality cleanup, s2 model removal - Contract and directive parsing tests - README updated with prefixed directive docs Source: Conan-Scott/openclaw@9787ef6e (feat/fish-audio-speech-provider)
This commit is contained in:
112
README.md
112
README.md
@@ -1,97 +1,51 @@
|
||||
# Fish Audio Speech Plugin for OpenClaw
|
||||
# Fish Audio Speech
|
||||
|
||||
A speech provider plugin that integrates [Fish Audio](https://fish.audio) TTS with OpenClaw.
|
||||
Bundled [Fish Audio](https://fish.audio) TTS speech provider for OpenClaw.
|
||||
|
||||
## Features
|
||||
|
||||
- **Fish Audio S2-Pro / S1 / S2** model support
|
||||
- **Dynamic voice listing** — your own cloned voices + official Fish Audio voices
|
||||
- **Format-aware output** — opus for voice notes (Telegram, WhatsApp), mp3 otherwise
|
||||
- **Inline directives** — switch voice, speed, model, and latency mid-message
|
||||
- **No core changes required** — standard `SpeechProviderPlugin` extension
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
openclaw plugins install @openclaw/fish-audio-speech
|
||||
```
|
||||
- Fish Audio S2-Pro and S1 model support
|
||||
- Dynamic voice listing (user's own cloned/trained voices via `self=true`)
|
||||
- Format-aware output: opus for voice notes (Telegram, WhatsApp), mp3 otherwise
|
||||
- Inline directives: voice, speed, model, latency, temperature, top_p
|
||||
- `voiceCompatible: true` for both formats
|
||||
|
||||
## Configuration
|
||||
|
||||
In your `openclaw.json`:
|
||||
|
||||
```json
|
||||
```json5
|
||||
{
|
||||
"messages": {
|
||||
"tts": {
|
||||
"provider": "fish-audio",
|
||||
"providers": {
|
||||
messages: {
|
||||
tts: {
|
||||
provider: "fish-audio",
|
||||
providers: {
|
||||
"fish-audio": {
|
||||
"apiKey": "your-fish-audio-api-key",
|
||||
"voiceId": "8a2d42279389471993460b85340235c5",
|
||||
"model": "s2-pro",
|
||||
"latency": "normal",
|
||||
"speed": 1.0
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
apiKey: "your-fish-audio-api-key",
|
||||
voiceId: "reference-id-of-voice",
|
||||
model: "s2-pro", // s2-pro | s1
|
||||
latency: "normal", // normal | balanced | low
|
||||
// speed: 1.0, // 0.5–2.0 (optional)
|
||||
// temperature: 0.7, // 0–1 (optional)
|
||||
// topP: 0.8, // 0–1 (optional)
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Config Options
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `apiKey` | string | — | **Required.** Fish Audio API key |
|
||||
| `voiceId` | string | — | **Required.** Reference ID of the voice to use |
|
||||
| `model` | string | `s2-pro` | TTS model (`s2-pro`, `s1`, `s2`) |
|
||||
| `latency` | string | `normal` | Latency mode (`normal`, `balanced`, `low`) |
|
||||
| `speed` | number | — | Prosody speed (0.5–2.0) |
|
||||
| `temperature` | number | — | Sampling temperature (0–1) |
|
||||
| `topP` | number | — | Top-p sampling (0–1) |
|
||||
| `baseUrl` | string | `https://api.fish.audio` | API base URL |
|
||||
|
||||
### Environment Variable
|
||||
|
||||
You can also set the API key via environment variable:
|
||||
|
||||
```bash
|
||||
FISH_AUDIO_API_KEY=your-key
|
||||
```
|
||||
Environment variable fallback: `FISH_AUDIO_API_KEY`.
|
||||
|
||||
## Directives
|
||||
|
||||
Use inline directives in your messages to control TTS per-message:
|
||||
All directive keys are provider-prefixed to avoid dispatch collisions with
|
||||
bundled providers (OpenAI, ElevenLabs) that claim generic keys like `voice`
|
||||
and `model`. Both `fishaudio_*` and shorter `fish_*` aliases are accepted.
|
||||
|
||||
```
|
||||
[[tts:voice=<ref_id>]] Switch voice
|
||||
[[tts:speed=1.2]] Prosody speed (0.5–2.0)
|
||||
[[tts:model=s1]] Model override
|
||||
[[tts:latency=low]] Latency mode
|
||||
[[tts:temperature=0.7]] Sampling temperature
|
||||
[[tts:top_p=0.8]] Top-p sampling
|
||||
[[tts:fishaudio_voice=<ref_id>]] Switch voice (or fish_voice)
|
||||
[[tts:fishaudio_speed=1.2]] Prosody speed 0.5–2.0 (or fish_speed)
|
||||
[[tts:fishaudio_model=s1]] Model override (or fish_model)
|
||||
[[tts:fishaudio_latency=low]] Latency mode (or fish_latency)
|
||||
[[tts:fishaudio_temperature=0.7]] Sampling temperature (or fish_temperature)
|
||||
[[tts:fishaudio_top_p=0.8]] Top-p sampling (or fish_top_p)
|
||||
```
|
||||
|
||||
## Voice Listing
|
||||
|
||||
The plugin dynamically lists available voices via `/tts voices`:
|
||||
- **Official Fish Audio voices** (~38 voices)
|
||||
- **Your own cloned/trained voices** (marked with "(mine)")
|
||||
|
||||
## Output Format
|
||||
|
||||
The plugin automatically selects the best format based on the channel:
|
||||
- **Voice note channels** (Telegram, WhatsApp, Matrix, Feishu) → Opus
|
||||
- **All other channels** → MP3
|
||||
|
||||
Both formats set `voiceCompatible: true` — Fish Audio output works cleanly as native voice notes.
|
||||
|
||||
## Requirements
|
||||
|
||||
- OpenClaw ≥ 2026.3.0
|
||||
- Fish Audio API key ([get one here](https://fish.audio))
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
|
||||
Reference in New Issue
Block a user