Edgen AI Speech-to-Speech API

Clara-Voice is a state-of-the-art multimodal conversational AI designed for real-time speech-to-speech interactions. It processes speech input, interprets the content dynamically, and responds via text, audio, or actions, enabling natural and fluid dialogue experiences.

With specialized training in Spanish and English, Clara-Voice is ideal for bilingual scenarios. The platform leverages a WebSocket-based API for streaming audio, supporting interruptions, and managing multimodal input seamlessly, redefining real-time conversational AI.

Key Features

Bilingual Support: Excels in Spanish and English conversations.
Dynamic Responses: Generates text, speech, or actionable commands.
Real-Time Streaming: Robust WebSocket API ensures low-latency interactions.
Interruption Management: Handles user interruptions effectively.
Multimodal Integration: Supports speech, text, and custom commands.

Getting Started

Connecting to Clara-Voice

Use the WebSocket API to establish a connection:

import EdgenAI from "edgenai";
 
const websocket = EdgenAI.speechToSpeech.connect();
 
// You will get a console log with the connection status
 
websocket.onmessage = (message) => {
  console.log("Received:", JSON.parse(message.data));
};

Here is some basic code to stream user audio to Clara-voice. This example uses the Web Audio API to capture audio from the microphone and stream it to Clara-voice.

const startCall = async () => {
  try {
    // Request microphone access
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
 
    // Initialize AudioContext
    const audioContext = new (window.AudioContext || window.webkitAudioContext)(
      { sampleRate: 16000 }
    );
    const sourceNode = audioContext.createMediaStreamSource(stream);
 
    // Set up audio processing
    const processor = audioContext.createScriptProcessor(16384, 1, 1);
    sourceNode.connect(processor);
    processor.connect(audioContext.destination);
 
    // Connect to Clara-Voice WebSocket API
    const websocket = new WebSocket("wss://api.edgen.ai/v1/speech-to-speech");
    websocket.onopen = () => console.log("WebSocket connection established");
 
    websocket.onmessage = (event) => {
      if (event.data instanceof Blob) {
        console.log("Received audio response:", event.data);
      } else {
        const response = JSON.parse(event.data);
        if (response.control === "assistant_interrupted") {
          console.log("Assistant was interrupted");
        } else if (response.status === "response_complete") {
          console.log("Assistant finished speaking");
        } else if (response.error) {
          console.error("Error:", response.error);
        }
      }
    };
 
    websocket.onerror = (error) => console.error("WebSocket error:", error);
    websocket.onclose = () => console.log("WebSocket connection closed");
 
    // Send audio data to the server
    processor.onaudioprocess = (event) => {
      const inputData = event.inputBuffer.getChannelData(0);
      const buffer = new Int16Array(inputData.length);
      for (let i = 0; i < inputData.length; i++) {
        buffer[i] = Math.min(1, inputData[i] * 0x7fff);
      }
      if (websocket.readyState === WebSocket.OPEN) {
        websocket.send(buffer.buffer);
      }
    };
  } catch (error) {
    console.error("Error starting call:", error);
  }
};

Text Completions Text to Speech