Add your own TTS Provider for Voiceovers

Overview

VRseBuilder’s Text-to-Speech (TTS) system converts text strings into AudioClip assets for VoiceOvers in your story. The system is designed to be extensible—you can integrate any TTS service (Google Cloud TTS, Amazon Polly, ElevenLabs, or a custom local solution) by implementing a provider interface. This guide walks through creating a custom TTS provider, configuring it, and making it available in VRseBuilder.

How TTS Providers Work

TTS providers act as adapters between VRseBuilder and external speech synthesis services. When the system needs to generate a VoiceOver, it calls your provider’s conversion methods, which handle the actual audio generation (typically via API requests or local processing). Two interfaces are available in VRseBuilder.Core.Systems.TextToSpeech:

Interface	Use Case
`ITextToSpeechProvider`	Basic single-language TTS
`IVRBTextToSpeechProvider`	Multi-language support and batch processing (recommended)

The IVRBTextToSpeechProvider interface extends the base interface with language-specific and batch conversion methods, making it the better choice for most implementations.

1. Create the Provider Class

Create a new C# script in your project. Place it in a Providers folder under Runtime to ensure VoiceOvers generate correctly when using LiveLink.

using System.Threading.Tasks;
using UnityEngine;
using VRseBuilder.Core.Systems.TextToSpeech;

public class MyCustomTTSProvider : IVRBTextToSpeechProvider
{
    private TextToSpeechConfiguration _configuration;

    // Implementation follows...
}

Important: Your provider class must be under a Runtime assembly. This ensures VO generation works when editing via LiveLink.

2. Implement SetConfig

Store the configuration object passed by the system. This object contains global settings like API keys, cache directory paths, and other provider-specific values.

public void SetConfig(TextToSpeechConfiguration configuration)
{
    _configuration = configuration;
}

3. Implement Conversion Methods

Implement the core methods that generate audio from text.

ConvertTextToSpeech (Default Language)

This method handles single-text conversion using a default language. Typically, delegate to the language-specific method.

public async Task<AudioClip> ConvertTextToSpeech(string text)
{
    return await ConvertTextToSpeechForLanguage(text, "en");
}

ConvertTextToSpeechForLanguage

This method converts text for a specific language code (e.g., "en", "es", "de").

public async Task<AudioClip> ConvertTextToSpeechForLanguage(string text, string language)
{
    if (string.IsNullOrWhiteSpace(text))
        return null;

    return await FetchAudioFromService(text, language);
}

ConvertMultipleTextToSpeech (Batch Processing)

Implement batch processing for generating multiple clips. VRseBuilder’s core logic checks whether an AudioClip already exists before calling the provider, so only missing clips trigger generation.

public async Task<AudioClip[]> ConvertMultipleTextToSpeech(string[] texts, string[] languages)
{
    var audioClips = new AudioClip[texts.Length];
    
    for (int i = 0; i < texts.Length; i++)
    {
        audioClips[i] = await ConvertTextToSpeechForLanguage(texts[i], languages[i]);
    }
    
    return audioClips;
}

Tip: Add caching logic using _configuration.StreamingAssetCacheDirectoryName to avoid redundant API calls.

4. Add Configuration Settings

If your provider requires custom settings (API keys, voice IDs, endpoint URLs), add them to the shared configuration class.

Define Your Settings

Open Assets/VRseBuilder/_Core/Runtime/Systems/TextToSpeech/Runtime/TextToSpeechConfiguration.cs and add public fields for your settings:

// In TextToSpeechConfiguration.cs

[Header("My Custom Provider Settings")]
public string MyCustomTTSAPIEndpoint = "https://api.example.com/tts";
public string MyCustomApiKey = "your_api_key_here";
public string MyCustomVoiceId = "default_voice";

Access Settings in Your Provider

Reference these fields through the stored configuration object:

private async Task<AudioClip> FetchAudioFromService(string text, string language)
{
    string apiKey = _configuration.MyCustomApiKey;
    string endpoint = _configuration.MyCustomTTSAPIEndpoint;
    
    // Use these values in your request...
}

Configure via Project Settings

Navigate to Edit → Project Settings → VRseBuilder → Text To Speech
Your new fields appear under the header you defined
Enter your API keys and other settings

Note: You can also edit settings directly on the TextToSpeechConfiguration asset at Assets/VRseBuilder/Resources/TextToSpeechConfiguration.asset, but using Project Settings is the recommended approach.

5. Implement Web API Requests

For providers that use external APIs, use UnityWebRequest with DownloadHandlerAudioClip to fetch audio.

using UnityEngine.Networking;

private async Task<AudioClip> FetchAudioFromService(string text, string language)
{
    string url = _configuration.MyCustomTTSAPIEndpoint;
    
    using (var request = UnityWebRequestMultimedia.GetAudioClip(url, AudioType.MPEG))
    {
        // Set authorization headers if required
        request.SetRequestHeader("Authorization", "Bearer " + _configuration.MyCustomApiKey);
        
        var operation = request.SendWebRequest();
        
        while (!operation.isDone)
            await Task.Yield();
        
        if (request.result != UnityWebRequest.Result.Success)
        {
            Debug.LogError($"TTS Request Failed: {request.error}");
            return null;
        }
        
        return DownloadHandlerAudioClip.GetContent(request);
    }
}

6. Select Your Provider

The TextToSpeechProviderFactory automatically discovers all classes implementing ITextToSpeechProvider. No manual registration is required. After compiling your script:

Navigate to Edit → Project Settings → VRseBuilder → Text To Speech
Open the Provider dropdown
Select your provider (MyCustomTTSProvider)

Your provider is now active and will be used for VoiceOver generation.

Limitations

The VO Preview panel (available via StoryEditWindows → VO Preview in version 0.6.1+) currently does not support custom TTS providers. It only supports changing settings for:

OpenAITextToSpeechProvider
VRBAPITextToSpeechProvider

Custom providers must be configured through Project Settings.

Best Practices

Practice	Recommendation
Error Handling	Wrap API calls in try-catch blocks and return `null` on failure to prevent crashes
Async Pattern	Use `await` for web requests to avoid blocking the main thread
Logging	Use `VRseLogger` (recommended) or `Debug.Log` to track request status
Reference Implementations	Review `OpenAITextToSpeechProvider.cs` and `VRBAPITextToSpeechProvider.cs` for working examples

Get Started

Guides

Studio

Workshop

Unity SDK

Features

Overview

How TTS Providers Work

1. Create the Provider Class

2. Implement SetConfig

3. Implement Conversion Methods

ConvertTextToSpeech (Default Language)

ConvertTextToSpeechForLanguage

ConvertMultipleTextToSpeech (Batch Processing)

4. Add Configuration Settings

Define Your Settings

Access Settings in Your Provider

Configure via Project Settings

5. Implement Web API Requests

6. Select Your Provider

Limitations

Best Practices

Get Started

Guides

Studio

Workshop

Unity SDK

Features

​Overview

​How TTS Providers Work

​1. Create the Provider Class

​2. Implement SetConfig

​3. Implement Conversion Methods

​ConvertTextToSpeech (Default Language)

​ConvertTextToSpeechForLanguage

​ConvertMultipleTextToSpeech (Batch Processing)

​4. Add Configuration Settings

​Define Your Settings

​Access Settings in Your Provider

​Configure via Project Settings

​5. Implement Web API Requests

​6. Select Your Provider

​Limitations

​Best Practices

Overview

How TTS Providers Work

1. Create the Provider Class

2. Implement SetConfig

3. Implement Conversion Methods

ConvertTextToSpeech (Default Language)

ConvertTextToSpeechForLanguage

ConvertMultipleTextToSpeech (Batch Processing)

4. Add Configuration Settings

Define Your Settings

Access Settings in Your Provider

Configure via Project Settings

5. Implement Web API Requests

6. Select Your Provider

Limitations

Best Practices