6labs.ai — AI-native design workflow

§00Where it stands, today.pilot · 2026 Q2

12

Game studios signed in 30 days

Commercial conviction, not engagement — devs paying to use the product, not just clicking around. More joining each cohort.

4

Agents shipped, one held back

Barista, Radiologist, Oracle, Forecaster live. Guardian (ad-fraud detection) held back — sensitive territory dealing with real money, needs more accuracy testing first.

3

Iterations of the product I led

Filtered video POC → natural-language + 2 agents → Barista as the orchestrator. Each move driven by what we learned from the previous one.

§01The goldmine sitting in plain sight.setup

BlueStacks had been capturing real gameplay video at scale through AI Highlights inside the App Player. A constantly growing corpus of unprompted, real-player sessions across hundreds of titles.

Game devs' status quo for understanding their own players had not changed: schedule playtests, watch hours of recordings, write notes. Slow. Biased. Small-N.

Mid-2024 onwards, AI productivity tools were reshaping every adjacent industry. Every dev tooling roadmap had an AI line item.

Internal read at BlueStacks: we're sitting on a goldmine devs would pay for. Leadership assigned the work. I was the sole designer.

§02Making the bet a real product.framing

Leadership had the idea: an AI platform trained on our gameplay corpus, giving devs superpowers in player understanding. My job was to translate that into a product. Three sub-questions had to be answered before any UI got drawn.

Q1 · CAPABILITY CEILING

What insights can this data actually surface?

What can the model reliably extract from raw gameplay video — and what's a stretch?

Q2 · VALUE FLOOR

Which of those would change a dev's work?

An impressive insight that doesn't change a roadmap is just a parlor trick.

Q3 · ACCESS

How does a dev reach those insights?

Without being a data scientist. Without writing SQL. Without learning a new craft.

Two constraints I set early to keep the work meaningful, not just impressive:

Every insight must be traceable back to real video. Devs trust their eyes, not stats. Ungrounded AI summaries get ignored.
Devs should be able to use it day one — no training, no new craft to learn. AI's job here is to remove the data-science barrier, not introduce a new one. If a dev has to study the product before it pays off, the bet has already failed.

These two constraints ended up shaping every iteration that followed.

§03Three iterations, one held back.the design moves

The product evolved across three iterations, each driven by what we learned from the previous one. A fifth agent exists but didn't ship — and that decision matters as much as the ones that did.

ITERATION 01POC · FILTERED VIDEO

Cut the watching, not the proof.

The dev problem

Studios were burning real money to understand their own players. They'd commission playtests, then assign people to watch the recordings to find the moments that mattered. The cost was paid twice — once to run the playtest, again to extract value from it manually.

What we built

An AI auto-tagged every video in our corpus across player-behaviour signatures. Devs filtered by those tags to surface the exact sessions they wanted, instead of trawling through every session. Limited dev resources, narrow scope.

What we learned

Devs liked it — already faster than the playtest-then-watch loop. Feedback was consistent: “go further. We don’t want to watch better videos. We want the answer, with the videos as proof.”

Decision

Expand scope. Move from filtered playback to generated insights, with video as evidence underneath.

ITERATION 02AI-NATIVE · TWO AGENTS

Ask in plain language, get the answer back.

The dev problem

Filters surfaced sessions; they didn’t answer questions. Devs across roles asked very different things, and a static filter UI couldn’t cover that surface area.

What we built

A natural-language input field, two agents behind it. Radiologist built on top of the filter piece — adds AI session summaries, a timeline of key events, and session metadata (region, platform, device, length). Oracle, experimental — answers focused dev queries across many sessions: “where did players lose the most HP and what caused it?”, “summarise rotation, drop spot, key moves, final-zone path.”

What we learned

Both agents worked technically. But devs hit a learning curve we hadn’t fully predicted — even with suggested queries and hints, devs didn’t know what to ask a system this powerful. The capability was there; the access wasn’t.

Decision

The next move had to solve query authoring, not agent quality.

ITERATION 03ON-RAMP · ORCHESTRATOR

Barista — the assistant that proposes the question.

The dev problem

Devs didn’t need more agents. They needed an entry point that knew which question to ask, for which role, on which signal — without making the dev the query writer.

What we built

Barista — a personal assistant that proactively suggests analyses based on the dev’s role on the team (PM, game designer, LiveOps, marketer) and the data flowing in. Sits above Radiologist and Oracle: Barista decides which agent to invoke and frames the result in language the role understands.

What we learned

Most users now land in Barista first. Radiologist and Oracle stay accessible as the depth — but the on-ramp is what unlocked them.

Decision

Barista is the default surface; the other agents are reachable through it or directly.

HELD BACKGUARDIAN · AD FRAUD

Guardian. The agent we chose not to ship.

What it does

Detects ad fraud across player sessions and gathers video evidence to recover wasted UA spend.

Why it didn't ship with the others

Fraud detection is sensitive territory — it deals with real ad spend and real accusations against real partners. We chose to push it back, raise the accuracy bar, and run more testing before making it available. Shipping discipline mattered more than shipping count.

§04Built in an AI-native workflow.process

6labs.ai was designed in an AI-native workflow — Figma↔code round-trips through Claude Code, custom Claude Code skills built for the team, and internal AI workshops to bring the broader org along.

→ Full process case study: An AI workflow that actually works

§05What shipped, and what we're hearing.signal

Four agents (Barista, Radiologist, Oracle, Forecaster) powered by the SixthSense™ Engine — the platform's name for the gameplay-trained model layer underneath. Barista is the default landing experience; Radiologist and Oracle are reached through it or directly.

The 4-agent surface

Barista (assistant), Radiologist (deep session inspector), Oracle (behavioural intelligence), Forecaster (predictive personas) — all powered by the SixthSense™ Engine. Barista is the default surface; Radiologist and Oracle reachable through it or directly.

Oracle — focused query flow

A dev asks a focused question across many sessions — “where did players lose the most HP and what caused it?” — and Oracle answers with grounded video evidence underneath. The most novel capability and the easiest to anonymise; this is the flow that demos best in pilot meetings.

Barista — proactive suggestion card

Barista decides which agent to invoke and frames the result in language that fits the dev’s role on the team (PM, game designer, LiveOps, marketer). The on-ramp that solved query authoring — devs land here first.

Pilot tour

Demos in US, Japan, and Korea, including a Free Fire–class title. Responses positive across regions.

Commercial signal

12 studios signed in 30 days. Devs paying to use the product, not engagement clicks.

How we read it

The pilot tour functioned as our usability research. Every studio meeting was a study we couldn't have run earlier with a static product.

§06What I'd do differently.reflection

01 · AI-native, day one

Integrate the Figma↔code round-trip from day one.

I brought my round-trip workflow in partway. Doing it from the start would have kept the design system tighter in code, and I'd have used dev branches as a design exploration surface, not a downstream artifact.

02 · Prototype against real test cases

Iterate on real test cases with working prototypes before any infra.

Before Iteration 1's POC, we should have run working prototypes against the real questions devs were trying to ask — the “what would I even ask this?” gap that Barista eventually solved. We caught it post-launch of Iteration 2 — six months late.

03 · Eval rubrics, week one

Co-author agent-eval rubrics with PMs from week one.

With agents, “looks right” isn't a quality bar — you need pre-defined eval cases, co-authored with PMs at concept time. We're catching up on this now.

Players,understood.In seconds.

Role

Timeline

Stage

Surface

Demos

12

4

3

What insights can this data actually surface?

Which of those would change a dev's work?

How does a dev reach those insights?

Cut the watching, not the proof.

Ask in plain language, get the answer back.

Barista — the assistant that proposes the question.

Guardian. The agent we chose not to ship.

The 4-agent surface

Oracle — focused query flow

Barista — proactive suggestion card

Pilot tour

Commercial signal

How we read it

Integrate the Figma↔code round-trip from day one.

Iterate on real test cases with working prototypes before any infra.

Co-author agent-eval rubrics with PMs from week one.