RL Environments for training & benchmarking AI agents.

We build simulated . Your agent practices on them, gets scored, and improves over time.

Book a call

Why thetalab

Built for real-world agent training.

Scores what actually matters

We measure real task completion, not surface metrics. Did the refund go through? Was the ticket resolved correctly? Your agent gets credit for outcomes, not guesses.

Trains on the edge cases

Edge cases break agents in production. Our environments inject failures, timeouts, and messy data your agent will actually face — so it learns to handle them.

Built for your workflow

Not a generic sandbox. We replicate the exact software your agent operates in — same forms, same states, same quirks. Train on what you'll deploy to.

obsrv.tech

Live executions, replayable forever.

obsrv is the flight data recorder for AI agents. It captures every run, clusters failures, and gives your team replayable evidence for debugging, evals, and retraining.

FDR - TRACES
OPS NORMAL

02 - TRACE FEED

Execution traces

Search traces by run, user, model, or semantic question...
StatusModelRun typeUser IDMetadata
NAMESTATUSMETADATALATENCYTOKENSSTARTED
agent:tool-error-recheck
tr_01KR4JR8ETH9QKJP09VD31NHVQ
PASSE2E6.98s2.8k13h ago
agent:tool-error
tr_01KR4JQFREZM1ZW24GX78H5S97
FAILE2E9.30s2.8k13h ago
agent:loop-prone
tr_01KR4JPRVJZKXZQWSSKTVRQ9C0
PASSE2E23.45s10.9k13h ago
long-text-probe
tr_51989550cae9451daf25387ecf091937
PASSAPI34.71s8.8k27/04/2026

Custom SLMs

A private model for the workflow your team repeats.

thetalab trains a compact model in your company RL environment, so the agent learns the exact job before it reaches production.

owned behaviorlower run costcontrolled rollout

Custom SLM, enterprise controlled

Model behavior is trained, evaluated, and monitored against your real workflow.

Learns your operation

Tools, approvals, policies, exceptions, and handoff rules.

Runs inside your boundary

Private data, controlled deployment, and auditable behavior.

Improves from evidence

obsrv turns real runs into evals and better training sets.

Best for high-volume workflows where generic agents are too expensive, too variable, or too hard to audit.

SDK

Start training in minutes.

train.py
import thetabench

env = thetabench.make("shopify-admin", task_id="prod-001")
obs, info = env.reset()

# gymnasium-compatible

Works with Stable Baselines, RLlib, and anything that speaks Gymnasium.

Blog(6)

Ready to train your agent?

Get in touch and we'll help you get started with the right environment for your use case.

Book a call