RL Environments for training & benchmarking AI agents.

We build simulated . Your agent practices on them, gets scored, and improves over time.

Why thetalab

Built for real-world agent training.

Scores what actually matters

We measure real task completion, not surface metrics. Did the refund go through? Was the ticket resolved correctly? Your agent gets credit for outcomes, not guesses.

Trains on the edge cases

Edge cases break agents in production. Our environments inject failures, timeouts, and messy data your agent will actually face — so it learns to handle them.

Built for your workflow

Not a generic sandbox. We replicate the exact software your agent operates in — same forms, same states, same quirks. Train on what you'll deploy to.

obsrv.tech

Live executions, replayable forever.

obsrv is the flight data recorder for AI agents. It captures every run, clusters failures, and gives your team replayable evidence for debugging, evals, and retraining.

FDR - TRACESOBSRV - FRIDAY

OPS NORMAL

02 - TRACE FEED

Execution traces

Open obsrv

Search traces by run, user, model, or semantic question...

StatusModelRun typeUser IDMetadata

NAME	STATUS	METADATA	LATENCY	TOKENS	STARTED
agent:tool-error-recheck tr_01KR4JR8ETH9QKJP09VD31NHVQ	PASS	E2E	6.98s	2.8k	13h ago
agent:tool-error tr_01KR4JQFREZM1ZW24GX78H5S97	FAIL	E2E	9.30s	2.8k	13h ago
agent:loop-prone tr_01KR4JPRVJZKXZQWSSKTVRQ9C0	PASS	E2E	23.45s	10.9k	13h ago
long-text-probe tr_51989550cae9451daf25387ecf091937	PASS	API	34.71s	8.8k	27/04/2026

Custom SLMs

A private model for the workflow your team repeats.

thetalab trains a compact model in your company RL environment, so the agent learns the exact job before it reaches production.

owned behaviorlower run costcontrolled rollout

Custom SLM, enterprise controlled

Model behavior is trained, evaluated, and monitored against your real workflow.

Learns your operation

Tools, approvals, policies, exceptions, and handoff rules.

Runs inside your boundary

Private data, controlled deployment, and auditable behavior.

Improves from evidence

obsrv turns real runs into evals and better training sets.

Best for high-volume workflows where generic agents are too expensive, too variable, or too hard to audit.

SDK

Start training in minutes.

train.py

import thetabench

env = thetabench.make("shopify-admin", task_id="prod-001")
obs, info = env.reset()

# gymnasium-compatible

Works with Stable Baselines, RLlib, and anything that speaks Gymnasium.

Blog(6)

Why RL Environments Matter More Than You Think

Published on: 20.03.2026

Agent Companies Should Stop Building Training Infrastructure

Published on: 18.03.2026

The Gap Between Demo and Production for AI Agents

Published on: 24.03.2026

Why Your AI Agent Fails on Edge Cases — And Why More Prompting Won't Fix It

Published on: 22.03.2026

All Posts

Ready to train your agent?

Get in touch and we'll help you get started with the right environment for your use case.

Book a call