BAI Live
Translator.
The Overview
BAI Live Translator is a real-time, voice-to-voice translation application that bridges the gap between Tagalog and Cebuano speakers. It captures audio directly from the user's microphone, transcribes the spoken Tagalog into text, translates it into Cebuano using a powerful AI model, and then speaks the translation aloud — creating a seamless, hands-free translation experience.
How It Works
The translation pipeline processes speech through a series of specialized stages:
- Audio CaptureThe application listens to the user's microphone in real time, capturing spoken Tagalog input as a continuous audio stream.
- Speech-to-Text (Vosk STT)The audio is transcribed into text using the Vosk speech recognition engine, which runs locally for low-latency, offline-capable transcription.
- AI TranslationThe transcribed Tagalog text is sent to an AI language model that translates it into natural-sounding Cebuano, preserving context and colloquial nuances.
- Text-to-Speech (Microsoft Edge TTS)The translated Cebuano text is synthesized into natural speech using Microsoft Edge TTS, speaking the translation aloud to complete the voice-to-voice loop.
Architecture & Design
The entire application is built in Python, leveraging a modular pipeline architecture that separates audio capture, transcription, translation, and speech synthesis into independent stages. Vosk provides lightweight, offline speech recognition, while Microsoft Edge TTS delivers high-quality neural voice synthesis without requiring paid API keys. The AI translation layer uses prompt engineering techniques to ensure accurate and culturally appropriate Cebuano output, handling idiomatic expressions and regional dialects gracefully.
Tech Stack
- Python (Core Application)
- Vosk (Speech-to-Text)
- Microsoft Edge TTS
- AI / LLM (Translation)
- NLP (Natural Language Processing)
Key Features
- Real-Time Voice Translation
- Offline Speech Recognition
- Neural Text-to-Speech Output
- Tagalog → Cebuano Pipeline
- Hands-Free Operation