A research lab for production AI.

Fine-tuning, multi-agent systems, and workflow automation — by faculty who publish at CVPR, NeurIPS, KDD, and EMNLP. We measure before we ship.

Start a conversation See work

Co-founders

3 PhDs + 1 Head of Engineering

Engineers

Active ML team, hiring to 50.

Publications

CVPR, NeurIPS, ECCV, KDD, EMNLP

Industry partners

Paid engagements across US, EU, AU, HK

Publishing at CVPR NeurIPS ICCV ECCV KDD EMNLP ACL AAAI TMLR WACV ICDM ACCV IEEE TIP BioSIG Publishing at CVPR NeurIPS ICCV ECCV KDD EMNLP ACL AAAI TMLR WACV ICDM ACCV IEEE TIP BioSIG

01 — The wedge

Where in-house AI usually stalls.

Five failure modes we see across teams trying to build AI in-house — Fortune 500s through growth-stage startups. None of them are budget.

Strategy gap.

Enterprises rarely have someone who can tell which problems are AI-tractable. The first 90 days are usually spent picking the wrong problem.

Engineering ≠ AI.

Strong engineering teams without ML depth pick the wrong tool — fine-tuning what should have been retrieved, retrieving what should have been classified.

Wrong abstraction level.

Custom-training a model when a fine-tune would do, or fine-tuning when a prompt would do. Compute spent on the wrong layer of the stack.

Wrong direction.

The reverse: reaching for an API when the problem needs training data, supervision, and evaluation. If no one is measuring, no one is shipping.

Workflows go un-AI'd.

The most interesting deployments usually sit inside the company, not on its product surface. Teams underestimate which internal processes are addressable.

$200K+

ML engineer comp

6–12mo

Ramp time

~40%

Utilization gap between initiatives

02 — Where we sit

Between a research lab and an AI devshop.

Research labs publish but rarely deploy. Devshops deploy but rarely produce methods of their own. We work in the seam — and the methods we publish are the methods we deploy.

03 — Capabilities

Four engagement shapes.

Each engagement runs on a different cadence. The right shape depends on whether you have a research question, a product specification, a team to upskill, or an architecture you cannot fully staff.

A · R&D Partnership

B · Build (Agentic + Custom Models)

C · Training

D · Technical Direction

A · R&D Partnership

Your research arm.

You hand us the domain. We run model, paper, and product tracks in parallel.

For teams that need to publish and produce against the same deadline — labs, startups with academic ambitions, or product teams whose roadmap depends on a research result. Tracks are not sequential; models inform papers, papers inform product, product surfaces inform the next model.

domain → models · papers · product

Usually forStartups and product teams with research dependencies.

B · Build

End-to-end builds.

From a clear ask, a general direction, or "we want AI somewhere."

For enterprises and growth-stage teams that need a working system. Scope is iterative — we do not quote off a templated SOW because templated SOWs do not survive the first measurement. Agentic systems, custom-tuned models, retrieval and orchestration, with the evaluation surface designed before the build starts.

agent + model + workflow

Usually forEnterprises with workflows to AI-ify.

C · Training

Training your team.

What AI can do, what it cannot, in your domain — with the concepts that matter for the work your team actually does.

Half-day leadership briefings to multi-week engineering deep-dives. Often the first engagement before a build — once a team can see the boundary of what is possible, the build scope sharpens.

leadership · engineering · applied

Usually forTeams new to ML, or teams scaling up.

D · Technical Direction

Senior AI judgment, fractional.

PhD-led architecture review, model validation, and unblocking — without a full-time hire.

For engineering teams that need senior AI judgment in the room weekly. We sit on architecture reviews, validate model choices, and pressure-test evaluation plans. Useful when the team has the engineering depth but needs the ML signal that does not show up on a job posting.

weekly · architecture · validation

Usually forEngineering orgs without an in-house AI lead.

04 — Engagement model

Inside a build.

Six steps on the spine. Six operating beats underneath. The sequence is iterative — never deliver-and-hand-off — because production AI rarely settles in one pass.

Intro

30-minute call. We learn the domain; you learn whether the shape fits.

Scoping

One or two sessions. Light or deep depending on how settled the problem is.

Proposal

We write the scope. Artifacts pulled, timeline drafted, evaluation surface sketched.

Lean

Heavy prototyping in weeks 1–2. The first version ships rough and live.

Ongoing

2–3 syncs a week, async between. Live UAT instead of staged demos.

Iterate

Shaped to the work. Handover or continuing engagement.

PhD in the room, weekly.

At least one weekly sync is led by a PhD founder. Model and architecture decisions get senior signal every week.

Live before polished.

Initial scope set, then a rough version into production fast. Refinement happens against real data, not assumed data.

Lean by default.

Small team to start. We scale only when the shape demands it and we can match each new role to a specific blocker.

Iterative on purpose.

Continuous feedback loops between us and you. No deliver-and-hand-off — production AI rarely settles in one pass.

Discovery before big scope.

For large engagements, a paid 1–3 month discovery precedes the build. We refuse to scope a 9-month engagement off a 1-hour call.

Two service lines.

Research partnerships for startups and labs. Agentic and enterprise AI for businesses with workflows to AI-ify. Different cadences, different teams.

05 — Research footprint

200+ peer-reviewed papers across the four founders' careers.

The number matters less than where the work lands. We publish at the same venues as the frontier labs and big-tech research groups — same conference floors, same peer review, different papers.

VENUES — and other labs publishing at each

CVPR · ICCV · ECCV

Google·Meta·Microsoft·NVIDIA·Stanford·MIT

NeurIPS

DeepMind·OpenAI·Anthropic·Berkeley·Stanford

KDD

LinkedIn·Pinterest·Amazon·Airbnb·Microsoft

EMNLP · ACL

Google·Anthropic·Meta·AI2·Stanford NLP

AAAI

IBM·Microsoft Research·DeepMind

TMLR

DeepMind·Anthropic·MIT·Stanford

WACV

Amazon·Apple·Tesla

ICDM

IBM·Microsoft Research

ACCV

Adobe Research·Tsinghua·Tencent

IEEE TIP

Microsoft·Samsung·Sony

06 — Work in production

Eight production deployments across six verticals.

All anonymized. Full case documentation — model architectures, evaluation surfaces, per-vertical performance breakdowns — is available on request. 19+ deployments total.

Work in production

01 / 07

Live feeds · on-prem

All channels online

Line efficiency

94.2%

01 · Manufacturing

Manufacturing

Vision-based quality inspection across an entire production line.

600+ live camera feeds, processed entirely on-premise — no cloud dependency, full data sovereignty. The deployment replaced a third-party vendor that could not meet the customer's privacy constraints. The challenge was not the model; it was the throughput — 24/7 streaming inference across 600 channels at the per-frame latency the line operator required. The model is small by design, distilled from a larger family, and runs on local GPU pods.

612feeds

Concurrent streams

24/7

Continuous inference

0frames

Sent to cloud

liveops

Real-time line insights

02 · Healthcare

Healthcare

Clinical document extraction live in 30+ US states.

LLM plus a medical knowledge graph for extracting ICD-10 and SNOMED procedure codes from unstructured clinical records. The graph is where the precision lives: an LLM alone hallucinates codes; an LLM constrained by a clinically-grounded ontology does not. 99.0% precision on the in-scope code subset, 5,010 records per minute streaming. The customer's compliance team chose this architecture over an in-house build because the eval surface — which codes we are confident about, which we abstain on — is the deliverable they could defend to auditors.

30+states

US coverage

ICD-10+ SNOMED

Both standards

cross-state

Compliant data sharing

KG-backed

Medical knowledge graph

Throughput

0/min

Streaming

Precision

99.0%

vs API stitch

1/10cost

Live customer

Channels

Web · Phone · App

03 · Finance — consumer

Finance — Consumer

Voice and video advisory agents for consumer finance.

Three constraints drove the architecture. (a) The advisory had to feel like a person — speech-to-speech, not chained ASR/TTS with a perceptible pause. (b) The avatar had to lip-sync at real-time latency, not pre-rendered. (c) The unit economics had to come in at a fraction of stitched-API cost, because the customer's product margin was the eval surface. We built a custom three-layer stack: an agentic logic layer for intent and tool-calling, a custom speech-to-speech model fine-tuned on consumer-finance dialogue, and a diffusion-based avatar trained for lip-sync alignment. Bespoke, not stitched.

3layers

Logic · voice · avatar

fractionof API cost

Custom speech-to-speech

real-time

Lip-synced avatar

omnichannel

Web · phone · app

05 · Enterprise / visual content

Image understanding at scale

Domain-tuned image captioning across 12+ client domains.

A single large captioning model paired with small per-domain refiners — a proxy-tuning architecture. The alternative would have been training 12 separate fine-tunes, each requiring its own data, compute, and evaluation surface. Proxy-tuning lets the base model do the heavy lifting and asks each small refiner only to rewrite the caption in the conventions of its domain. The result is a fraction of the per-domain fine-tuning cost, and a fraction of the per-domain failure surface. New refiners are added without retraining the base.

1base

Foundation model

12+domains

Image captioning

fractionof cost

Training + inference

domain-resilient

No per-domain SFT

Languages live

0dialects

Streaming

End-to-end

740ms

06 · Healthcare / accessibility

Voice health, native dialects

Native-language voice health consultation for underserved populations.

The customer's users speak six dialects of an underserved language; the comorbidity rate among them is high; and most are not literate enough in the regional lingua franca to use a typed interface. The system is chained ASR → LLM + medical comorbidity KG → TTS, with the LLM optimized for regional dialect comprehension and the knowledge graph supplying clinical grounding. End-to-end latency: 720 ms. The hardest engineering was not the LLM; it was the ASR — accurate transcription of dialects with minimal labeled data, which is itself a methods contribution we published before deploying.

ASR· LLM · TTS

Custom-trained chain

regionaldialects

Low-resource languages

KG-aware

Medical comorbidity graph

S2Snext

Speech-to-speech in dev

08 · Finance — institutional

Predictive asset trading

Predictive asset trading for institutional desks.

An ensemble of classical financial models with transformer networks, optimized through reinforcement learning on roughly a decade of historical market data. The reinforcement signal is portfolio P&L under realistic transaction-cost assumptions — not next-tick prediction accuracy. The standard failure mode in academic trading papers is precisely the use of unrealistic eval surfaces; this system is built against that failure. The model decides position sizing; the operator retains kill-switch and override. Measurable ROI lift over the classical baseline; live performance disclosed under NDA.

ensemble+ transformer

Hybrid stack

RL-optimised

Decision policy

accuracy↑ ROI ↑

Measurable lift

~10ydata

Training horizon

Current

202.4USD

Live ticker

Predicted Δ · 5m

+1.8%

As of

2024

Temporal lens

07 · Legal — tax + regulatory

Compliance reasoning

A graph that remembers when the law changed.

A knowledge graph over the tax and regulatory corpus — temporally aligned, so it always answers using the provisions in force for the year you are asking about. Off-the-shelf RAG does not survive contact with legal text: citations drift, prior-year provisions get conflated with current ones, and the model cannot tell which is which. This one can. Section IDs accompany every answer. Multilingual; currently deployed in V0.

temporal

Year-aware retrieval

cited

Section IDs in every answer

multi-lingual

Working-language corpus

prod

v0 deployed

Type 2 DM

Bilateral knee…

Hypertensive…

→

E11.9

M17.0

I10

02 · Healthcare

ICD-10 / SNOMED across 30+ US states.

LLM plus medical knowledge graph for procedure-code extraction from unstructured records.

Agent · logic

Custom speech-to-speech

Diffusion-based avatar

03 · Finance — consumer

Omnichannel advisory at API-cost fraction.

Three-layer stack: agentic logic, custom speech-to-speech, diffusion avatars. Built bespoke instead of stitched APIs.

0hsaved per week

04 · Retail / supply chain

15–20 hours a week back on supplier specs.

Multimodal change detection across spec versions, standardized supplier costing. Procurement gets time back.

05 · CV / enterprise R&D

Proxy tuning across 12+ domains.

One large LLM plus small domain refiners. Cuts training and inference cost to a fraction of per-domain fine-tuning.

06 · Accessibility / healthcare

Native-language voice consultation.

Chained ASR + LLM + TTS pipeline optimized for regional dialects, integrated with a medical comorbidity knowledge graph.

07 · Engineering / devtools

MCP-native AI integration.

AI natively uses enterprise dev tools and frontend standards. Frictionless adoption inside existing engineering workflows.

08 · Finance — institutional

Predictive asset trading.

Ensemble of traditional financial models with transformer networks, optimized via reinforcement learning on historical market data.

19+ cases across 6 verticals. Full documentation on request.

Request the full pack

intelligent
lab.ai

hello@intelligentlab.ai

Singapore

152 Beach Road, #23-03
Gateway East