Truffles

The Problem

Manual QA doesn't scale

Session recording tools like PostHog generate thousands of replays. Teams record everything but review almost nothing. Bugs hide in plain sight—visible in recordings that nobody watches. The data exists. The human bandwidth doesn't.

What if you could point an LLM at every session recording and have it detect UI bugs, filter out noise, and then hand off the real issues to a coding agent that opens a PR? That's Truffles.

What It Does

From session replay to merged PR

Ingest

Pulls session recordings from PostHog, renders rrweb events into MP4 video via a headless Chromium pipeline, and uploads to S3.

Analyze

Dual-model vision analysis (Kimi K2.5 + Gemini 3 Pro) examines video frames. A separate model reviews console errors and network failures. Results are deduplicated and screened.

Fix

Claude Code agents receive verified issues, check out isolated worktrees, locate the bug in code, implement fixes, and open PRs on GitHub—or report false alarms.

Architecture

System overview

Turborepo monorepo. Express API handles orchestration, WebSocket streaming, and agent lifecycle. React frontend provides real-time observability. Everything runs on a single process.

  ┌─────────────────────────────────────────────────────────────────────┐
  │  PostHog Cloud                                                     │
  │  session recordings (rrweb events, metadata, console logs)          │
  └──────────────────────────────┬──────────────────────────────────────┘
                                 │ poll / sync
                                 ▼
  ┌─────────────────────────────────────────────────────────────────────┐
  │  Truffles API  (Express + WebSocket)                                 │
  │                                                                     │
  │  ProcessingManager    render rrweb → mp4 via Playwright + ffmpeg    │
  │          │                                                            │
  │          ▼                                                            │
  │  AnalysisManager      dual-model vision + session data analysis     │
  │          │              deduplication + screening                       │
  │          ▼                                                            │
  │  AgentManager         Claude Code SDK → worktree → code → PR        │
  │                                                                     │
  └────────┬──────────────────┬────────────────────┬────────────────────┘
           │                  │                    │
           ▼                  ▼                    ▼
     MongoDB           AWS S3             GitHub
     sessions          videos             PRs on
     issues            frames             target repo
     agents            thumbnails

                                 │
                                 ▼
  ┌─────────────────────────────────────────────────────────────────────┐
  │  Truffles Web  (React + WebSocket)                                   │
  │  sessions · issues · agent lab · PR review · dashboard              │
  └─────────────────────────────────────────────────────────────────────┘

The Pipeline

Step by step

Session ingestion

PostHog sessions are synced into Truffles. The Sessions page lists all available recordings with metadata—duration, user, active time. An admin selects sessions to process.

Video rendering & analysis

rrweb events are replayed inside a headless Chromium browser, captured at 4x speed, and encoded to MP4 via ffmpeg. Two vision models (Kimi K2.5, Gemini 3 Pro) examine extracted frames while a separate text model reviews console errors and network failures. Results are deduplicated against recent issues and screened through learned suppression rules to filter noise.

Issue triage

Verified issues are surfaced with severity levels (red for critical, yellow for minor), LLM reasoning, and links back to the source session. Each issue includes the model's explanation of what it found and why it matters.

Agent execution

A Claude Code agent receives the issue, checks out an isolated git worktree, and works through phases: verify the bug exists in code, plan the fix, implement it, and run lint/typecheck. If it can't find related code, it reports a false alarm instead of guessing. Output streams in real time to the Agent Lab.

PR creation & review

Successful agents create a pull request on the target repo with a clear description of the issue and fix. The Truffles dashboard shows all PRs with inline diffs, issue context, and links to GitHub for final human approval.

Tech Stack

What it's built with

Frontend

React + Vite + Tailwind

React Router, WebSocket, dark mode

Backend

Express + TypeScript

REST API, WebSocket streaming, in-process agents

Database

MongoDB + Mongoose

Sessions, issues, agent history, settings

Video Pipeline

Playwright + ffmpeg

rrweb replay in headless Chromium, MP4 encoding

Vision Models

Kimi K2.5 + Gemini 3 Pro

Via OpenRouter, multimodal frame analysis

Reasoning

Claude Opus 4.6

Screening, deduplication, session data analysis

Code Agents

Claude Code SDK

Agentic coding with tools, isolated worktrees

Infrastructure

AWS S3 + GitHub API

Video storage, PR creation, repo management

Observability

PostHog

Session recordings, console errors, network logs

Monorepo

Turborepo

Shared types, parallel builds, unified dev

DevOps Connection

Why this matters for QA & DevOps

Automated QA

Truffles replaces manual session review with LLM-powered analysis. It watches every recording, not just the ones a human happens to check. This is the logical next step beyond traditional automated testing—testing the actual user experience, not just code paths.

CI/CD integration

Detected issues flow directly into the PR workflow. Agents create branches, implement fixes, and open PRs with full context. The human role shifts from "find and fix" to "review and merge"—a fundamentally different feedback loop.

Observability pipeline

PostHog session data (console errors, network failures, DOM events) feeds into a structured analysis pipeline. This is production observability applied to QA—treating user sessions as telemetry rather than debug artifacts.

False alarm management

The suppression rule system learns from mistakes. When an agent reports a false alarm, the pattern is stored and used to filter future detections. This mirrors alert fatigue management in production monitoring—a core DevOps discipline.

What I Learned

Reflections

RRweb Video rendering is surprisingly hard

Replaying rrweb events in a headless browser, capturing frames, and encoding to MP4 required handling browser lifecycle, timeline compression, memory limits, and timeout enforcement. The rendering pipeline went through several iterations.

LLM orchestration is a superpower

The individual API calls are simple. Chaining them into a reliable pipelines, with deduplication, screening, false alarm detection, and agent handoffs, is where the real engineering lives. Each stage needs clear contracts and failure modes.

Agents need escape hatches

Without an explicit "false alarm" option, coding agents will make speculative changes to justify their existence. Similarly, give agents a "Dev feedback" field to let them complain if you set them up for failure.

Multi-model analysis adds reliability

"State of the Art" is a constantly moving target with LLMs. Kimi K2.5 and Gemini 3 Pro both show excellent vision benchmark results, I didn't know which would perform better. I honestly never got around to seeing which I liked better, they're both still fighting with every video analyzed.

WebSocket streaming changes the UX

Watching an agent work in real time—seeing it read files, reason about the bug, and write a fix—is fundamentally different from waiting for a result. Real-time observability made debugging the agents themselves much faster. Also it gives me hope that I can turn it off before it goes too off the rails. Realistically, nobody will be watching it when something goes wrong, and I'll have to live with the YOLO permissions I gave it.

Worktree isolation is essential

Running multiple coding agents concurrently requires complete filesystem isolation. Git worktrees solved this elegantly—each agent gets its own checkout of the repo on its own branch, with automatic cleanup of orphaned worktrees. Ports still might fight if they try to run their code, I haven't solved that problem yet.