Skip to main content

Overview

VRseBuilder’s Text-to-Speech (TTS) system converts text strings into AudioClip assets for VoiceOvers in your story. The system is designed to be extensible—you can integrate any TTS service (Google Cloud TTS, Amazon Polly, ElevenLabs, or a custom local solution) by implementing a provider interface. This guide walks through creating a custom TTS provider, configuring it, and making it available in VRseBuilder.

How TTS Providers Work

TTS providers act as adapters between VRseBuilder and external speech synthesis services. When the system needs to generate a VoiceOver, it calls your provider’s conversion methods, which handle the actual audio generation (typically via API requests or local processing). Two interfaces are available in VRseBuilder.Core.Systems.TextToSpeech:
InterfaceUse Case
ITextToSpeechProviderBasic single-language TTS
IVRBTextToSpeechProviderMulti-language support and batch processing (recommended)
The IVRBTextToSpeechProvider interface extends the base interface with language-specific and batch conversion methods, making it the better choice for most implementations.

1. Create the Provider Class

Create a new C# script in your project. Place it in a Providers folder under Runtime to ensure VoiceOvers generate correctly when using LiveLink.
using System.Threading.Tasks;
using UnityEngine;
using VRseBuilder.Core.Systems.TextToSpeech;

public class MyCustomTTSProvider : IVRBTextToSpeechProvider
{
    private TextToSpeechConfiguration _configuration;

    // Implementation follows...
}
Important: Your provider class must be under a Runtime assembly. This ensures VO generation works when editing via LiveLink.

2. Implement SetConfig

Store the configuration object passed by the system. This object contains global settings like API keys, cache directory paths, and other provider-specific values.
public void SetConfig(TextToSpeechConfiguration configuration)
{
    _configuration = configuration;
}

3. Implement Conversion Methods

Implement the core methods that generate audio from text.

ConvertTextToSpeech (Default Language)

This method handles single-text conversion using a default language. Typically, delegate to the language-specific method.
public async Task<AudioClip> ConvertTextToSpeech(string text)
{
    return await ConvertTextToSpeechForLanguage(text, "en");
}

ConvertTextToSpeechForLanguage

This method converts text for a specific language code (e.g., "en", "es", "de").
public async Task<AudioClip> ConvertTextToSpeechForLanguage(string text, string language)
{
    if (string.IsNullOrWhiteSpace(text))
        return null;

    return await FetchAudioFromService(text, language);
}

ConvertMultipleTextToSpeech (Batch Processing)

Implement batch processing for generating multiple clips. VRseBuilder’s core logic checks whether an AudioClip already exists before calling the provider, so only missing clips trigger generation.
public async Task<AudioClip[]> ConvertMultipleTextToSpeech(string[] texts, string[] languages)
{
    var audioClips = new AudioClip[texts.Length];
    
    for (int i = 0; i < texts.Length; i++)
    {
        audioClips[i] = await ConvertTextToSpeechForLanguage(texts[i], languages[i]);
    }
    
    return audioClips;
}
Tip: Add caching logic using _configuration.StreamingAssetCacheDirectoryName to avoid redundant API calls.

4. Add Configuration Settings

If your provider requires custom settings (API keys, voice IDs, endpoint URLs), add them to the shared configuration class.

Define Your Settings

Open Assets/VRseBuilder/_Core/Runtime/Systems/TextToSpeech/Runtime/TextToSpeechConfiguration.cs and add public fields for your settings:
// In TextToSpeechConfiguration.cs

[Header("My Custom Provider Settings")]
public string MyCustomTTSAPIEndpoint = "https://api.example.com/tts";
public string MyCustomApiKey = "your_api_key_here";
public string MyCustomVoiceId = "default_voice";

Access Settings in Your Provider

Reference these fields through the stored configuration object:
private async Task<AudioClip> FetchAudioFromService(string text, string language)
{
    string apiKey = _configuration.MyCustomApiKey;
    string endpoint = _configuration.MyCustomTTSAPIEndpoint;
    
    // Use these values in your request...
}

Configure via Project Settings

  1. Navigate to Edit → Project Settings → VRseBuilder → Text To Speech
  2. Your new fields appear under the header you defined
  3. Enter your API keys and other settings
Note: You can also edit settings directly on the TextToSpeechConfiguration asset at Assets/VRseBuilder/Resources/TextToSpeechConfiguration.asset, but using Project Settings is the recommended approach.

5. Implement Web API Requests

For providers that use external APIs, use UnityWebRequest with DownloadHandlerAudioClip to fetch audio.
using UnityEngine.Networking;

private async Task<AudioClip> FetchAudioFromService(string text, string language)
{
    string url = _configuration.MyCustomTTSAPIEndpoint;
    
    using (var request = UnityWebRequestMultimedia.GetAudioClip(url, AudioType.MPEG))
    {
        // Set authorization headers if required
        request.SetRequestHeader("Authorization", "Bearer " + _configuration.MyCustomApiKey);
        
        var operation = request.SendWebRequest();
        
        while (!operation.isDone)
            await Task.Yield();
        
        if (request.result != UnityWebRequest.Result.Success)
        {
            Debug.LogError($"TTS Request Failed: {request.error}");
            return null;
        }
        
        return DownloadHandlerAudioClip.GetContent(request);
    }
}

6. Select Your Provider

The TextToSpeechProviderFactory automatically discovers all classes implementing ITextToSpeechProvider. No manual registration is required. After compiling your script:
  1. Navigate to Edit → Project Settings → VRseBuilder → Text To Speech
  2. Open the Provider dropdown
  3. Select your provider (MyCustomTTSProvider)
Your provider is now active and will be used for VoiceOver generation.

Limitations

The VO Preview panel (available via StoryEditWindows → VO Preview in version 0.6.1+) currently does not support custom TTS providers. It only supports changing settings for:
  • OpenAITextToSpeechProvider
  • VRBAPITextToSpeechProvider
Custom providers must be configured through Project Settings.

Best Practices

PracticeRecommendation
Error HandlingWrap API calls in try-catch blocks and return null on failure to prevent crashes
Async PatternUse await for web requests to avoid blocking the main thread
LoggingUse VRseLogger (recommended) or Debug.Log to track request status
Reference ImplementationsReview OpenAITextToSpeechProvider.cs and VRBAPITextToSpeechProvider.cs for working examples