AI / Full Stack

Building a Production AI Dashboard with Next.js, Node, and Streaming LLMs

How I architect a real production AI dashboard with Next.js 15, Node.js, server-sent events, and Retell AI integrations \u2014 with the architecture choices nobody warns you about.

A working AI dashboard is mostly not the model. It is the boring plumbing around the model — streaming, auth, webhooks, retries, observability — that decides whether your product feels alive or feels broken. Here is how I have been building these dashboards at Technosmart and on freelance work.

What the dashboard actually does

The version I ship has three jobs, and an architecture should be evaluated on how cleanly it handles all three:

  1. Configure AI workers / agents (prompts, voices, tools, escalation rules).
  2. Monitor live calls and conversations with token-level streaming so the operator sees what the AI is saying as it says it.
  3. Review historical interactions, with search, filters, and quick-replay.

Most "AI dashboards" online demos only do step 1. Steps 2 and 3 are where production lives.

The stack, and why

  • Next.js 15 (App Router) on the frontend. Server actions for mutations, route handlers for streaming, RSC for the heavy data fetches.
  • Node.js + Fastify backend for everything that has to live outside the Next.js process — webhooks, long-running jobs, queued tasks.
  • PostgreSQL for state. pgvector for transcript embedding search.
  • Server-Sent Events for live streaming. Not WebSockets. We will come back to that choice.
  • Retell AI for the actual voice runtime; our dashboard wraps it.

The streaming question: SSE or WebSockets?

The first big choice. Both work. I default to SSE for AI dashboards because:

  • It is one-directional (server → browser), which is exactly the model's behaviour.
  • HTTP/2 multiplexing makes connection cost trivial.
  • It survives load balancers and CDNs that hate WebSockets.
  • The browser auto-reconnects. WebSockets need you to write that yourself.

WebSockets are right when the user is sending lots of small messages back. For a dashboard where the user mostly watches and the AI mostly talks, SSE is calmer.

The streaming endpoint, simplified

A Next.js route handler that proxies model tokens to the browser:

// app/api/agent/stream/route.ts
export async function POST(req: Request) {
  const { workerId, message } = await req.json();
  const auth = await requireSession();

  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      const upstream = await openai.chat.completions.create({
        model: pickModel(workerId),
        stream: true,
        messages: await buildMessages(workerId, message, auth.workspaceId),
      });

      for await (const chunk of upstream) {
        const token = chunk.choices[0]?.delta?.content ?? '';
        if (token) {
          controller.enqueue(
            encoder.encode(`data: ${JSON.stringify({ t: token })}\n\n`)
          );
        }
      }
      controller.enqueue(encoder.encode('event: done\ndata: {}\n\n'));
      controller.close();
    },
  });

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache, no-transform',
      Connection: 'keep-alive',
    },
  });
}

Receiving on the client

The browser side is short. The trick is to write tokens to a local buffer, not to React state, and flush via requestAnimationFrame — otherwise React re-renders every single token and your UI starts to wobble.

const [text, setText] = useState('');
const bufferRef = useRef('');

useEffect(() => {
  let raf: number;
  const flush = () => {
    if (bufferRef.current) {
      setText((t) => t + bufferRef.current);
      bufferRef.current = '';
    }
    raf = requestAnimationFrame(flush);
  };
  raf = requestAnimationFrame(flush);
  return () => cancelAnimationFrame(raf);
}, []);

useEffect(() => {
  const es = new EventSource('/api/agent/stream?id=' + workerId);
  es.onmessage = (e) => {
    const { t } = JSON.parse(e.data);
    bufferRef.current += t;
  };
  es.addEventListener('done', () => es.close());
  return () => es.close();
}, [workerId]);

Webhooks: the thing that always breaks

Retell and most voice platforms call your webhook with call events: call.started, call.ended, transcript.partial, etc. Three rules I have made painful mistakes ignoring:

  1. Verify the signature. Every provider includes one. Reject the request if it does not match — bots will absolutely send fake events at your webhook URL once it is public.
  2. Respond in under 3 seconds. Most providers retry on a timeout. If your handler does heavy work, push it to a queue and return 200 immediately.
  3. Idempotency. Events will arrive twice. Store the event ID, ignore duplicates.

Auth: do not skip this even for an internal tool

Use Auth.js v5 (formerly NextAuth). Argon2id for password hashing. Session cookies, not JWT in localStorage. Yes, even for an internal dashboard — laptops get lost, and a sloppy auth model becomes the breach story you tell at the next job.

Observability

For a dashboard, I instrument three things from day one:

  • Every model call (model, tokens, latency, feature, cache hit).
  • Every webhook (provider, event type, processing time, retry count).
  • Every UI error (the streaming bar froze, the SSE reconnected, etc.) via the browser's reportError hook.

None of this needs Datadog. PostgreSQL + a tiny Grafana panel is more than enough until you outgrow it.

What I would not do again

  • Treat the model as the product. The dashboard is the product. The model is one library it uses.
  • Render every streamed token as React state. Use a buffer + RAF.
  • Trust the provider's webhook ordering. Re-sort on event timestamp before applying.

Final architecture diagram

[Operator browser]
   ↕ SSE for streaming, HTTPS for actions
[Next.js App Router] ── Server Actions for mutations
   │
   ├─→ [Node + Fastify worker pool]
   │     ↑
   │     └── Queue for slow work
   │
   ├─→ [Retell AI runtime] ── webhook → /api/webhook/retell
   │
   └─→ [PostgreSQL + pgvector]

This is genuinely the stack I ship in production. If you want one built for your business — voice, chat, internal ops dashboard — start the conversation in the contact section on the homepage.

Ready to build?

Turn this kind of architecture into your product.

Start a project →