Intelligent Lab AI Contact

A research lab for production AI.

Fine-tuning, multi-agent systems, and workflow automation — by faculty who publish at CVPR, NeurIPS, KDD, and EMNLP. We measure before we ship.

Co-founders
0
3 PhDs + 1 Head of Engineering
Engineers
0+
Active ML team, hiring to 50.
Publications
0+
CVPR, NeurIPS, ECCV, KDD, EMNLP
Industry partners
0
Paid engagements across US, EU, AU, HK
Publishing at CVPR NeurIPS ICCV ECCV KDD EMNLP ACL AAAI TMLR WACV ICDM ACCV IEEE TIP BioSIG Publishing at CVPR NeurIPS ICCV ECCV KDD EMNLP ACL AAAI TMLR WACV ICDM ACCV IEEE TIP BioSIG
01 — The wedge

Where in-house AI usually stalls.

Five failure modes we see across teams trying to build AI in-house — Fortune 500s through growth-stage startups. None of them are budget.

01
Strategy gap.

Enterprises rarely have someone who can tell which problems are AI-tractable. The first 90 days are usually spent picking the wrong problem.

02
Engineering ≠ AI.

Strong engineering teams without ML depth pick the wrong tool — fine-tuning what should have been retrieved, retrieving what should have been classified.

03
Wrong abstraction level.

Custom-training a model when a fine-tune would do, or fine-tuning when a prompt would do. Compute spent on the wrong layer of the stack.

04
Wrong direction.

The reverse: reaching for an API when the problem needs training data, supervision, and evaluation. If no one is measuring, no one is shipping.

05
Workflows go un-AI'd.

The most interesting deployments usually sit inside the company, not on its product surface. Teams underestimate which internal processes are addressable.

$200K+
ML engineer comp
6–12mo
Ramp time
~40%
Utilization gap between initiatives
02 — Where we sit

Between a research lab and an AI devshop.

Research labs publish but rarely deploy. Devshops deploy but rarely produce methods of their own. We work in the seam — and the methods we publish are the methods we deploy.

03 — Capabilities

Four engagement shapes.

Each engagement runs on a different cadence. The right shape depends on whether you have a research question, a product specification, a team to upskill, or an architecture you cannot fully staff.

A · R&D Partnership
B · Build (Agentic + Custom Models)
C · Training
D · Technical Direction
A · R&D Partnership
Your research arm.
You hand us the domain. We run model, paper, and product tracks in parallel.

For teams that need to publish and produce against the same deadline — labs, startups with academic ambitions, or product teams whose roadmap depends on a research result. Tracks are not sequential; models inform papers, papers inform product, product surfaces inform the next model.

domain → models · papers · product
Usually forStartups and product teams with research dependencies.
B · Build
End-to-end builds.
From a clear ask, a general direction, or "we want AI somewhere."

For enterprises and growth-stage teams that need a working system. Scope is iterative — we do not quote off a templated SOW because templated SOWs do not survive the first measurement. Agentic systems, custom-tuned models, retrieval and orchestration, with the evaluation surface designed before the build starts.

agent + model + workflow
Usually forEnterprises with workflows to AI-ify.
C · Training
Training your team.
What AI can do, what it cannot, in your domain — with the concepts that matter for the work your team actually does.

Half-day leadership briefings to multi-week engineering deep-dives. Often the first engagement before a build — once a team can see the boundary of what is possible, the build scope sharpens.

leadership · engineering · applied
Usually forTeams new to ML, or teams scaling up.
D · Technical Direction
Senior AI judgment, fractional.
PhD-led architecture review, model validation, and unblocking — without a full-time hire.

For engineering teams that need senior AI judgment in the room weekly. We sit on architecture reviews, validate model choices, and pressure-test evaluation plans. Useful when the team has the engineering depth but needs the ML signal that does not show up on a job posting.

weekly · architecture · validation
Usually forEngineering orgs without an in-house AI lead.
04 — Engagement model

Inside a build.

Six steps on the spine. Six operating beats underneath. The sequence is iterative — never deliver-and-hand-off — because production AI rarely settles in one pass.

01
Intro

30-minute call. We learn the domain; you learn whether the shape fits.

02
Scoping

One or two sessions. Light or deep depending on how settled the problem is.

03
Proposal

We write the scope. Artifacts pulled, timeline drafted, evaluation surface sketched.

04
Lean

Heavy prototyping in weeks 1–2. The first version ships rough and live.

05
Ongoing

2–3 syncs a week, async between. Live UAT instead of staged demos.

06
Iterate

Shaped to the work. Handover or continuing engagement.

01

PhD in the room, weekly.

At least one weekly sync is led by a PhD founder. Model and architecture decisions get senior signal every week.

02

Live before polished.

Initial scope set, then a rough version into production fast. Refinement happens against real data, not assumed data.

03

Lean by default.

Small team to start. We scale only when the shape demands it and we can match each new role to a specific blocker.

04

Iterative on purpose.

Continuous feedback loops between us and you. No deliver-and-hand-off — production AI rarely settles in one pass.

05

Discovery before big scope.

For large engagements, a paid 1–3 month discovery precedes the build. We refuse to scope a 9-month engagement off a 1-hour call.

06

Two service lines.

Research partnerships for startups and labs. Agentic and enterprise AI for businesses with workflows to AI-ify. Different cadences, different teams.

05 — Research footprint

200+ peer-reviewed papers across the four founders' careers.

The number matters less than where the work lands. We publish at the same venues as the frontier labs and big-tech research groups — same conference floors, same peer review, different papers.

VENUES — and other labs publishing at each
CVPR · ICCV · ECCV
Google·Meta·Microsoft·NVIDIA·Stanford·MIT
KDD
LinkedIn·Pinterest·Amazon·Airbnb·Microsoft
EMNLP · ACL
Google·Anthropic·Meta·AI2·Stanford NLP
AAAI
IBM·Microsoft Research·DeepMind
TMLR
DeepMind·Anthropic·MIT·Stanford
WACV
Amazon·Apple·Tesla
ICDM
IBM·Microsoft Research
ACCV
Adobe Research·Tsinghua·Tencent
IEEE TIP
Microsoft·Samsung·Sony
06 — Work in production

Eight production deployments across six verticals.

All anonymized. Full case documentation — model architectures, evaluation surfaces, per-vertical performance breakdowns — is available on request. 19+ deployments total.

Work in production
01 / 07
PRODUCTION ZONE · L1–L3 L1 L2 L3
Live feeds · on-prem
0+
All channels online
Line efficiency
94.2%
01 · Manufacturing
Manufacturing

Vision-based quality inspection across an entire production line.

600+ live camera feeds, processed entirely on-premise — no cloud dependency, full data sovereignty. The deployment replaced a third-party vendor that could not meet the customer's privacy constraints. The challenge was not the model; it was the throughput — 24/7 streaming inference across 600 channels at the per-frame latency the line operator required. The model is small by design, distilled from a larger family, and runs on local GPU pods.

612feeds
Concurrent streams
24/7
Continuous inference
0frames
Sent to cloud
liveops
Real-time line insights
02 · Healthcare
Healthcare

Clinical document extraction live in 30+ US states.

LLM plus a medical knowledge graph for extracting ICD-10 and SNOMED procedure codes from unstructured clinical records. The graph is where the precision lives: an LLM alone hallucinates codes; an LLM constrained by a clinically-grounded ontology does not. 99.0% precision on the in-scope code subset, 5,010 records per minute streaming. The customer's compliance team chose this architecture over an in-house build because the eval surface — which codes we are confident about, which we abstain on — is the deliverable they could defend to auditors.

30+states
US coverage
ICD-10+ SNOMED
Both standards
cross-state
Compliant data sharing
KG-backed
Medical knowledge graph
PATIENT NOTE E11.9 M17.0 I10 J45.9 N18.3 PATIENT RECORDS MEDICAL KNOWLEDGE GRAPH
Throughput
0/min
Streaming
Precision
99.0%
Plans · decides WEB PHONE APP
vs API stitch
1/10cost
Live customer
Channels
Web · Phone · App
03 · Finance — consumer
Finance — Consumer

Voice and video advisory agents for consumer finance.

Three constraints drove the architecture. (a) The advisory had to feel like a person — speech-to-speech, not chained ASR/TTS with a perceptible pause. (b) The avatar had to lip-sync at real-time latency, not pre-rendered. (c) The unit economics had to come in at a fraction of stitched-API cost, because the customer's product margin was the eval surface. We built a custom three-layer stack: an agentic logic layer for intent and tool-calling, a custom speech-to-speech model fine-tuned on consumer-finance dialogue, and a diffusion-based avatar trained for lip-sync alignment. Bespoke, not stitched.

3layers
Logic · voice · avatar
fractionof API cost
Custom speech-to-speech
real-time
Lip-synced avatar
omnichannel
Web · phone · app
05 · Enterprise / visual content
Image understanding at scale

Domain-tuned image captioning across 12+ client domains.

A single large captioning model paired with small per-domain refiners — a proxy-tuning architecture. The alternative would have been training 12 separate fine-tunes, each requiring its own data, compute, and evaluation surface. Proxy-tuning lets the base model do the heavy lifting and asks each small refiner only to rewrite the caption in the conventions of its domain. The result is a fraction of the per-domain fine-tuning cost, and a fraction of the per-domain failure surface. New refiners are added without retraining the base.

1base
Foundation model
12+domains
Image captioning
fractionof cost
Training + inference
domain-resilient
No per-domain SFT
FINANCE CAPTION "Captioning..."
ASR LLM + KG TTS "আমার মাথা ব্যথা।" "কতদিন থেকে?" PATIENT (native dialect) AI · KG-AWARE SYSTEM (same dialect)
Languages live
0dialects
Streaming
End-to-end
740ms
06 · Healthcare / accessibility
Voice health, native dialects

Native-language voice health consultation for underserved populations.

The customer's users speak six dialects of an underserved language; the comorbidity rate among them is high; and most are not literate enough in the regional lingua franca to use a typed interface. The system is chained ASR → LLM + medical comorbidity KG → TTS, with the LLM optimized for regional dialect comprehension and the knowledge graph supplying clinical grounding. End-to-end latency: 720 ms. The hardest engineering was not the LLM; it was the ASR — accurate transcription of dialects with minimal labeled data, which is itself a methods contribution we published before deploying.

ASR· LLM · TTS
Custom-trained chain
regionaldialects
Low-resource languages
KG-aware
Medical comorbidity graph
S2Snext
Speech-to-speech in dev
08 · Finance — institutional
Predictive asset trading

Predictive asset trading for institutional desks.

An ensemble of classical financial models with transformer networks, optimized through reinforcement learning on roughly a decade of historical market data. The reinforcement signal is portfolio P&L under realistic transaction-cost assumptions — not next-tick prediction accuracy. The standard failure mode in academic trading papers is precisely the use of unrealistic eval surfaces; this system is built against that failure. The model decides position sizing; the operator retains kill-switch and override. Measurable ROI lift over the classical baseline; live performance disclosed under NDA.

ensemble+ transformer
Hybrid stack
RL-optimised
Decision policy
accuracy↑ ROI ↑
Measurable lift
~10ydata
Training horizon
214.0 208.0 202.0 196.0 190.0 184.0 NOW PREDICTED
Current
202.4USD
Live ticker
Predicted Δ · 5m
+1.8%

19+ cases across 6 verticals. Full documentation on request.

Request the full pack
intelligent
lab.ai
Singapore
152 Beach Road, #23-03
Gateway East