Overview
VRseBuilder’s Text-to-Speech (TTS) system converts text strings intoAudioClip assets for VoiceOvers in your story. The system is designed to be extensible—you can integrate any TTS service (Google Cloud TTS, Amazon Polly, ElevenLabs, or a custom local solution) by implementing a provider interface.
This guide walks through creating a custom TTS provider, configuring it, and making it available in VRseBuilder.
How TTS Providers Work
TTS providers act as adapters between VRseBuilder and external speech synthesis services. When the system needs to generate a VoiceOver, it calls your provider’s conversion methods, which handle the actual audio generation (typically via API requests or local processing). Two interfaces are available inVRseBuilder.Core.Systems.TextToSpeech:
| Interface | Use Case |
|---|---|
ITextToSpeechProvider | Basic single-language TTS |
IVRBTextToSpeechProvider | Multi-language support and batch processing (recommended) |
IVRBTextToSpeechProvider interface extends the base interface with language-specific and batch conversion methods, making it the better choice for most implementations.
1. Create the Provider Class
Create a new C# script in your project. Place it in aProviders folder under Runtime to ensure VoiceOvers generate correctly when using LiveLink.
Important: Your provider class must be under a Runtime assembly. This ensures VO generation works when editing via LiveLink.
2. Implement SetConfig
Store the configuration object passed by the system. This object contains global settings like API keys, cache directory paths, and other provider-specific values.3. Implement Conversion Methods
Implement the core methods that generate audio from text.ConvertTextToSpeech (Default Language)
This method handles single-text conversion using a default language. Typically, delegate to the language-specific method.ConvertTextToSpeechForLanguage
This method converts text for a specific language code (e.g.,"en", "es", "de").
ConvertMultipleTextToSpeech (Batch Processing)
Implement batch processing for generating multiple clips. VRseBuilder’s core logic checks whether an AudioClip already exists before calling the provider, so only missing clips trigger generation.
Tip: Add caching logic using _configuration.StreamingAssetCacheDirectoryName to avoid redundant API calls.
4. Add Configuration Settings
If your provider requires custom settings (API keys, voice IDs, endpoint URLs), add them to the shared configuration class.Define Your Settings
OpenAssets/VRseBuilder/_Core/Runtime/Systems/TextToSpeech/Runtime/TextToSpeechConfiguration.cs and add public fields for your settings:
Access Settings in Your Provider
Reference these fields through the stored configuration object:Configure via Project Settings
- Navigate to Edit → Project Settings → VRseBuilder → Text To Speech
- Your new fields appear under the header you defined
- Enter your API keys and other settings
Note: You can also edit settings directly on theTextToSpeechConfigurationasset atAssets/VRseBuilder/Resources/TextToSpeechConfiguration.asset, but using Project Settings is the recommended approach.
5. Implement Web API Requests
For providers that use external APIs, useUnityWebRequest with DownloadHandlerAudioClip to fetch audio.
6. Select Your Provider
TheTextToSpeechProviderFactory automatically discovers all classes implementing ITextToSpeechProvider. No manual registration is required.
After compiling your script:
- Navigate to Edit → Project Settings → VRseBuilder → Text To Speech
- Open the Provider dropdown
- Select your provider (
MyCustomTTSProvider)
Limitations
The VO Preview panel (available via StoryEditWindows → VO Preview in version 0.6.1+) currently does not support custom TTS providers. It only supports changing settings for:OpenAITextToSpeechProviderVRBAPITextToSpeechProvider
Best Practices
| Practice | Recommendation |
|---|---|
| Error Handling | Wrap API calls in try-catch blocks and return null on failure to prevent crashes |
| Async Pattern | Use await for web requests to avoid blocking the main thread |
| Logging | Use VRseLogger (recommended) or Debug.Log to track request status |
| Reference Implementations | Review OpenAITextToSpeechProvider.cs and VRBAPITextToSpeechProvider.cs for working examples |