Edgen AI Speech-to-Speech API
Clara-Voice is a state-of-the-art multimodal conversational AI designed for real-time speech-to-speech interactions. It processes speech input, interprets the content dynamically, and responds via text, audio, or actions, enabling natural and fluid dialogue experiences.
With specialized training in Spanish and English, Clara-Voice is ideal for bilingual scenarios. The platform leverages a WebSocket-based API for streaming audio, supporting interruptions, and managing multimodal input seamlessly, redefining real-time conversational AI.
Key Features
- Bilingual Support: Excels in Spanish and English conversations.
- Dynamic Responses: Generates text, speech, or actionable commands.
- Real-Time Streaming: Robust WebSocket API ensures low-latency interactions.
- Interruption Management: Handles user interruptions effectively.
- Multimodal Integration: Supports speech, text, and custom commands.
Getting Started
Connecting to Clara-Voice
Use the WebSocket API to establish a connection:
import EdgenAI from "edgenai";
const websocket = EdgenAI.speechToSpeech.connect();
// You will get a console log with the connection status
websocket.onmessage = (message) => {
console.log("Received:", JSON.parse(message.data));
};Here is some basic code to stream user audio to Clara-voice. This example uses the Web Audio API to capture audio from the microphone and stream it to Clara-voice.
const startCall = async () => {
try {
// Request microphone access
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
// Initialize AudioContext
const audioContext = new (window.AudioContext || window.webkitAudioContext)(
{ sampleRate: 16000 }
);
const sourceNode = audioContext.createMediaStreamSource(stream);
// Set up audio processing
const processor = audioContext.createScriptProcessor(16384, 1, 1);
sourceNode.connect(processor);
processor.connect(audioContext.destination);
// Connect to Clara-Voice WebSocket API
const websocket = new WebSocket("wss://api.edgen.ai/v1/speech-to-speech");
websocket.onopen = () => console.log("WebSocket connection established");
websocket.onmessage = (event) => {
if (event.data instanceof Blob) {
console.log("Received audio response:", event.data);
} else {
const response = JSON.parse(event.data);
if (response.control === "assistant_interrupted") {
console.log("Assistant was interrupted");
} else if (response.status === "response_complete") {
console.log("Assistant finished speaking");
} else if (response.error) {
console.error("Error:", response.error);
}
}
};
websocket.onerror = (error) => console.error("WebSocket error:", error);
websocket.onclose = () => console.log("WebSocket connection closed");
// Send audio data to the server
processor.onaudioprocess = (event) => {
const inputData = event.inputBuffer.getChannelData(0);
const buffer = new Int16Array(inputData.length);
for (let i = 0; i < inputData.length; i++) {
buffer[i] = Math.min(1, inputData[i] * 0x7fff);
}
if (websocket.readyState === WebSocket.OPEN) {
websocket.send(buffer.buffer);
}
};
} catch (error) {
console.error("Error starting call:", error);
}
};