Building a Production AI Dashboard with Next.js, Node, and Streaming LLMs

A working AI dashboard is mostly not the model. It is the boring plumbing around the model — streaming, auth, webhooks, retries, observability — that decides whether your product feels alive or feels broken. Here is how I have been building these dashboards at Technosmart and on freelance work.

What the dashboard actually does

The version I ship has three jobs, and an architecture should be evaluated on how cleanly it handles all three:

Configure AI workers / agents (prompts, voices, tools, escalation rules).
Monitor live calls and conversations with token-level streaming so the operator sees what the AI is saying as it says it.
Review historical interactions, with search, filters, and quick-replay.

Most "AI dashboards" online demos only do step 1. Steps 2 and 3 are where production lives.

The stack, and why

Next.js 15 (App Router) on the frontend. Server actions for mutations, route handlers for streaming, RSC for the heavy data fetches.
Node.js + Fastify backend for everything that has to live outside the Next.js process — webhooks, long-running jobs, queued tasks.
PostgreSQL for state. pgvector for transcript embedding search.
Server-Sent Events for live streaming. Not WebSockets. We will come back to that choice.
Retell AI for the actual voice runtime; our dashboard wraps it.

The streaming question: SSE or WebSockets?

The first big choice. Both work. I default to SSE for AI dashboards because:

It is one-directional (server → browser), which is exactly the model's behaviour.
HTTP/2 multiplexing makes connection cost trivial.
It survives load balancers and CDNs that hate WebSockets.
The browser auto-reconnects. WebSockets need you to write that yourself.

WebSockets are right when the user is sending lots of small messages back. For a dashboard where the user mostly watches and the AI mostly talks, SSE is calmer.

The streaming endpoint, simplified

A Next.js route handler that proxies model tokens to the browser:

// app/api/agent/stream/route.ts
export async function POST(req: Request) {
  const { workerId, message } = await req.json();
  const auth = await requireSession();

  const encoder = new TextEncoder();
  const stream = new ReadableStream({
    async start(controller) {
      const upstream = await openai.chat.completions.create({
        model: pickModel(workerId),
        stream: true,
        messages: await buildMessages(workerId, message, auth.workspaceId),
      });

      for await (const chunk of upstream) {
        const token = chunk.choices[0]?.delta?.content ?? '';
        if (token) {
          controller.enqueue(
            encoder.encode(`data: ${JSON.stringify({ t: token })}\n\n`)
          );
        }
      }
      controller.enqueue(encoder.encode('event: done\ndata: {}\n\n'));
      controller.close();
    },
  });

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache, no-transform',
      Connection: 'keep-alive',
    },
  });
}

Receiving on the client

The browser side is short. The trick is to write tokens to a local buffer, not to React state, and flush via requestAnimationFrame — otherwise React re-renders every single token and your UI starts to wobble.

const [text, setText] = useState('');
const bufferRef = useRef('');

useEffect(() => {
  let raf: number;
  const flush = () => {
    if (bufferRef.current) {
      setText((t) => t + bufferRef.current);
      bufferRef.current = '';
    }
    raf = requestAnimationFrame(flush);
  };
  raf = requestAnimationFrame(flush);
  return () => cancelAnimationFrame(raf);
}, []);

useEffect(() => {
  const es = new EventSource('/api/agent/stream?id=' + workerId);
  es.onmessage = (e) => {
    const { t } = JSON.parse(e.data);
    bufferRef.current += t;
  };
  es.addEventListener('done', () => es.close());
  return () => es.close();
}, [workerId]);

Webhooks: the thing that always breaks

Retell and most voice platforms call your webhook with call events: call.started, call.ended, transcript.partial, etc. Three rules I have made painful mistakes ignoring:

Verify the signature. Every provider includes one. Reject the request if it does not match — bots will absolutely send fake events at your webhook URL once it is public.
Respond in under 3 seconds. Most providers retry on a timeout. If your handler does heavy work, push it to a queue and return 200 immediately.
Idempotency. Events will arrive twice. Store the event ID, ignore duplicates.

Auth: do not skip this even for an internal tool

Use Auth.js v5 (formerly NextAuth). Argon2id for password hashing. Session cookies, not JWT in localStorage. Yes, even for an internal dashboard — laptops get lost, and a sloppy auth model becomes the breach story you tell at the next job.

Observability

For a dashboard, I instrument three things from day one:

Every model call (model, tokens, latency, feature, cache hit).
Every webhook (provider, event type, processing time, retry count).
Every UI error (the streaming bar froze, the SSE reconnected, etc.) via the browser's reportError hook.

None of this needs Datadog. PostgreSQL + a tiny Grafana panel is more than enough until you outgrow it.

What I would not do again

Treat the model as the product. The dashboard is the product. The model is one library it uses.
Render every streamed token as React state. Use a buffer + RAF.
Trust the provider's webhook ordering. Re-sort on event timestamp before applying.

Final architecture diagram

[Operator browser]
   ↕ SSE for streaming, HTTPS for actions
[Next.js App Router] ── Server Actions for mutations
   │
   ├─→ [Node + Fastify worker pool]
   │     ↑
   │     └── Queue for slow work
   │
   ├─→ [Retell AI runtime] ── webhook → /api/webhook/retell
   │
   └─→ [PostgreSQL + pgvector]

This is genuinely the stack I ship in production. If you want one built for your business — voice, chat, internal ops dashboard — start the conversation in the contact section on the homepage.