Connect Laravel with Twilio + OpenAI voice assistant

2. How Calls Flow: Architecture at a Glance

A caller dials your Twilio number.
Twilio hits your Laravel voice webhook, which returns TwiML to start a Media Stream.
Twilio streams the call audio (mono PCM) over WebSocket to your bridge server.
The bridge relays audio events to OpenAI Realtime and relays synthesized audio back to Twilio.
The caller hears instantaneous, natural responses with barge-in support.
Twilio+2
Twilio+2

3. Prerequisites & Environment Setup

Laravel 10+, PHP 8.2+, Composer.
A Twilio account with a phone number that supports Voice, and permission to set Voice webhooks & Media Streams.
An OpenAI API key with access to the Realtime API.
Node.js (for a simple WebSocket proxy), or a PHP Ratchet server if you prefer all-PHP.
Public HTTPS URL for webhooks (use ngrok while local).
Store secrets in .env, never in code. Docs for Voice webhooks and Realtime will be your compass.
Twilio+1

4. Create the Laravel Project & Essentials

composer create-project laravel/laravel twilio-voice-ai
cd twilio-voice-ai
cp .env.example .env
php artisan key:generate

In .env, add:

TWILIO_ACCOUNT_SID=ACxxxxxxxx
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_NUMBER=+1XXXXXXXXXX
OPENAI_API_KEY=sk-...
REALTIME_MODEL=gpt-realtime-preview   # or latest recommended
BRIDGE_WS=wss://your-bridge.example.com/media
APP_URL=https://your-ngrok-host.ngrok.app

Cache config when done:

php artisan config:cache

5. Provision Twilio: Number, Voice Webhook & TwiML

In your Twilio console, set the Voice webhook of your number to POST https://YOUR_DOMAIN/voice/incoming. Your Laravel endpoint will return TwiML that starts a Media Stream to your WS bridge. TwiML is Twilio’s XML instruction set for calls; it can say, play, gather, record—or start streaming audio.
Twilio

Example TwiML your controller might return:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Start>
    <Stream url="wss://your-bridge.example.com/media" />
  </Start>
  <Say>Connecting you to our AI assistant. You can interrupt me anytime.</Say>
</Response>

Voice webhook behavior and supported parameters are defined in Twilio’s webhook and TwiML docs.
Twilio+1

6. Laravel Routes & Controllers for Voice

Create a dedicated controller to generate TwiML and manage call state.

routes/web.php

use Illuminate\Support\Facades\Route;
use App\Http\Controllers\VoiceController;

Route::post('/voice/incoming', [VoiceController::class, 'incoming']);   // Twilio -> Laravel
Route::post('/voice/status', [VoiceController::class, 'status']);       // optional: call status callbacks

app/Http/Controllers/VoiceController.php

<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;
use Illuminate\Support\Facades\Log;
use Illuminate\Support\Facades\Response;

class VoiceController extends Controller
{
    public function incoming(Request $request)
    {
        // (Optional) Validate X-Twilio-Signature header here

        $twiml = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Start>
    <Stream url="{$_ENV['BRIDGE_WS']}" />
  </Start>
  <Say>Hi! I'm your AI voice assistant. How can I help?</Say>
</Response>
XML;

        return Response::make($twiml, 200, ['Content-Type' => 'text/xml']);
    }

    public function status(Request $request)
    {
        Log::info('Call status', $request->all());
        return response('', 204);
    }
}

Twilio will POST to your route with call details; you return TwiML.
Twilio

7. Streaming Audio with Twilio Media Streams

Enable Media Streams in your TwiML via <Start><Stream url="..."/></Start>. Twilio opens a WebSocket and sends base64/PCM frames from the live call. You’ll forward those frames to the OpenAI Realtime session and write Realtime’s audio bytes back to the same WS so the caller hears the AI instantly.
Twilio+1

Key details:

Audio is typically mono, 8kHz or 16kHz PCM.
Handle start, media, mark, and stop events.
Keep latency low: buffer minimally and stream promptly.
Handle barge-in (interruptions) by pausing or cancelling current TTS from Realtime.
Twilio+1

8. OpenAI Realtime: From Audio to Real-Time Replies

The Realtime API supports duplex audio—you stream audio in and get synthesized speech back, with natural turn-taking and interruption handling. Create a Realtime session with your model (e.g., gpt-realtime-*), send audio chunks, and subscribe to response audio events. Recent updates improved production voice agents and even added telephony/SIP touchpoints—perfect for this scenario.
platform.openai.com+1

Typical Realtime flow

Create session with voice, temperature, and instructions.
Send input_audio_buffer.append frames; input_audio_buffer.commit to signal end-of-utterance.
Realtime emits response events with TTS audio frames.
Bridge writes frames to Twilio WS.
On caller interruption, send response.cancel to stop speaking and restart recognition.
platform.openai.com

9. The WebSocket Bridge (Node/PHP) that Glues It Together

Although you can write a PHP WS server with Ratchet, most teams use Node.js for this bridge because of streaming ergonomics and existing samples from OpenAI/Twilio. The bridge sits between Twilio Media Streams and OpenAI Realtime: it receives Twilio media events, converts formats if needed, and forwards them to Realtime; then it pipes TTS audio back to Twilio. Reference demos and SDK extensions show the event mapping and audio formatting in detail.
GitHub+1

Minimal Node bridge (conceptual):

import WebSocket, { WebSocketServer } from 'ws';
import fetch from 'node-fetch';

const server = new WebSocketServer({ port: 8080 });

server.on('connection', async (twilioWs) => {
  // Open Realtime session
  const rt = new WebSocket('wss://api.openai.com/v1/realtime?model=gpt-realtime-preview', {
    headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` }
  });

  // Twilio -> OpenAI
  twilioWs.on('message', (msg) => {
    const evt = JSON.parse(msg);
    if (evt.event === 'media') {
      // forward PCM bytes to Realtime input buffer (pseudo)
      rt.send(JSON.stringify({ type: 'input_audio_buffer.append', audio: evt.media.payload }));
    }
    if (evt.event === 'mark' || evt.event === 'stop') {
      rt.send(JSON.stringify({ type: 'input_audio_buffer.commit' }));
    }
  });

  // OpenAI -> Twilio (TTS)
  rt.on('message', (data) => {
    const event = JSON.parse(data);
    if (event.type === 'response.audio.delta') {
      twilioWs.send(JSON.stringify({ event: 'media', media: { payload: event.audio } }));
    }
  });
});

(Real code must handle audio encoding, backpressure, reconnects, and barge-in robustly.)
GitHub

10. Putting It Together in Laravel

Inbound call: Twilio hits /voice/incoming.
TwiML start: Laravel replies with TwiML that starts the stream to BRIDGE_WS.
Bridge ↔ Realtime: Your bridge relays audio between Twilio and OpenAI and manages the session.
Agent handoff: If the AI detects “handoff” intent, your bridge can instruct Laravel (via REST) to dial an agent and warm-transfer the caller. Twilio’s warm transfer pattern is well-known in Laravel/PHP.
Twilio

Optional: Human Handoff Endpoint

Route::post('/voice/transfer', [VoiceController::class, 'transfer']);

public function transfer(Request $request)
{
    $agentNumber = '+1XXXXXXXXXX';
    $twiml = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Dial callerId="{$_ENV['TWILIO_NUMBER']}">{$agentNumber}</Dial>
</Response>
XML;
    return response($twiml, 200)->header('Content-Type', 'text/xml');
}

11. Security, Privacy & Compliance

Validate Twilio signatures on webhooks to stop spoofing.
Keep API keys in .env, rotate regularly, and restrict bridge access.
Log minimally; avoid storing raw audio unless needed and disclose it if you do.
PII handling: redact or hash sensitive data; comply with local laws.
Network security: WSS only; enforce TLS; set timeouts. (Twilio and OpenAI docs outline best practices for production voice agents.)
Twilio+1

12. Local Testing with ngrok & Useful Tools

Use ngrok to expose https://YOURHOST.ngrok.app for Twilio callbacks.
Verify that /voice/incoming returns valid TwiML; test calls from your Twilio number.
Watch bridge logs to confirm media frames and Realtime responses.
Twilio’s Code Exchange templates and tutorials are great testbeds to compare behavior.
Twilio

13. Deploying to Production

Host Laravel on a reliable PHP platform with HTTPS certs.
Containerize the bridge and run multiple instances behind a load balancer.
Use health checks, autoscaling, and observability (metrics for call latency, WS reconnects, error rates).
Store structured events (turns, transcripts) securely for QA and analytics.

14. Cost Control & Performance Tuning

Twilio Voice charges per minute; OpenAI Realtime charges by usage (model and audio).
Reduce tokens by constraining system instructions and max output length.
Implement silence detection: don’t send silent frames to Realtime.
Cache short factual answers in memory to avoid extra model calls.
Use lower-cost real-time models for routine calls; upgrade for VIP routes.
Twilio

15. Troubleshooting Common Errors

Twilio 400 from webhook: make sure your route returns valid XML/TwiML with Content-Type: text/xml.
Twilio
WS cannot connect: verify your WSS URL, cert, and CORS/headers; ensure your bridge listens on the correct path.
No audio returned: check audio format/encoding between Twilio and Realtime; inspect event types.
OpenAI GitHub
High latency: reduce buffering, send audio frames promptly, enable barge-in/cancel, and keep regions close (Twilio ↔ bridge ↔ OpenAI).
Twilio
Session drops after minutes: implement heartbeat/ping on both sockets and graceful retry with state replay.
Outbound vs inbound confusion: outbound calls can specify a TwiML URL directly on creation; inbound uses the number’s webhook setting.
Stack Overflow

16. FAQs — Connect Laravel with Twilio + OpenAI voice assistant

Q1. Can I build this entirely in PHP without Node.js?
Yes. You can run a PHP WebSocket server (e.g., Ratchet) for the bridge. Many teams still choose Node for streaming ergonomics and community examples.
GitHub

Q2. Do I need the Assistants API?
No. Use OpenAI Realtime for voice. It’s built for live, low-latency audio and interruption handling.
platform.openai.com

Q3. How do I validate that Twilio’s webhook is authentic?
Verify the X-Twilio-Signature header using your Twilio auth token before trusting the request body.
Twilio

Q4. Can the assistant talk first when the call connects?
Yes. Start the Media Stream and immediately prompt via Say or send an initial Realtime message to speak. Some outbound examples even have the AI open the conversation.
Twilio

Q5. What audio format should I expect?
Twilio streams PCM frames over WS (commonly μ-law/PCM at telephony rates). Confirm format and convert as needed before pushing to Realtime.
Twilio

Q6. How do I add human handoff?
Detect “handoff” intent, then return TwiML with <Dial> to connect an agent (warm transfer pattern).
Twilio

Q7. Where can I see a working end-to-end sample?
Check the OpenAI × Twilio Realtime demo repositories and Twilio’s Code Exchange templates for reference implementations.
GitHub+1

17. Conclusion & Next Steps

You now have a clear, production-ready plan to Connect Laravel with Twilio + OpenAI voice assistant: Laravel serves Twilio webhooks and TwiML, Twilio streams the call, a small WS bridge talks to OpenAI Realtime, and callers enjoy human-like conversations with barge-in and fast responses. From here, add conversation analytics, CRM lookups, secure action tools (calendars, order systems), and intelligent routing.

External references (highly recommended)

OpenAI Realtime API Guide — architecture, session & event types.
platform.openai.com
Twilio Voice Webhooks & TwiML — how to structure responses & stream media.
Twilio+1
Twilio + OpenAI Realtime tutorials & demos — end-to-end examples and code.
Twilio+1

Bonus: Quick Laravel Snippets

Middleware to bypass CSRF for Twilio (optional):

// app/Http/Middleware/VerifyCsrfToken.php
protected $except = [
    'voice/*',
];

.env helper config (config/services.php):

'openai' => [
    'key' => env('OPENAI_API_KEY'),
    'realtime_model' => env('REALTIME_MODEL', 'gpt-realtime-preview'),
],
'twilio' => [
    'sid'    => env('TWILIO_ACCOUNT_SID'),
    'token'  => env('TWILIO_AUTH_TOKEN'),
    'number' => env('TWILIO_NUMBER'),
],

Testing checklist

✔️ Twilio number points to /voice/incoming
✔️ TwiML returns <Start><Stream> with your WS URL
✔️ Bridge logs show Twilio events and Realtime responses
✔️ Caller hears low-latency responses and can interrupt mid-speech

Remember: Use the keyword exactly—Connect Laravel with Twilio + OpenAI voice assistant—in titles, one subheading, and naturally in the copy (done).

Fermin Perdomo