RL Environments for training & benchmarking AI agents.
We build simulated . Your agent practices on them, gets scored, and improves over time.
Book a callWhy thetalab
Built for real-world agent training.
Scores what actually matters
We measure real task completion, not surface metrics. Did the refund go through? Was the ticket resolved correctly? Your agent gets credit for outcomes, not guesses.
Trains on the edge cases
Edge cases break agents in production. Our environments inject failures, timeouts, and messy data your agent will actually face — so it learns to handle them.
Built for your workflow
Not a generic sandbox. We replicate the exact software your agent operates in — same forms, same states, same quirks. Train on what you'll deploy to.
obsrv.tech
Live executions, replayable forever.
obsrv is the flight data recorder for AI agents. It captures every run, clusters failures, and gives your team replayable evidence for debugging, evals, and retraining.
02 - TRACE FEED
Execution traces
Open obsrv| NAME | STATUS | METADATA | LATENCY | TOKENS | STARTED |
|---|---|---|---|---|---|
agent:tool-error-recheck tr_01KR4JR8ETH9QKJP09VD31NHVQ | PASS | E2E | 6.98s | 2.8k | 13h ago |
agent:tool-error tr_01KR4JQFREZM1ZW24GX78H5S97 | FAIL | E2E | 9.30s | 2.8k | 13h ago |
agent:loop-prone tr_01KR4JPRVJZKXZQWSSKTVRQ9C0 | PASS | E2E | 23.45s | 10.9k | 13h ago |
long-text-probe tr_51989550cae9451daf25387ecf091937 | PASS | API | 34.71s | 8.8k | 27/04/2026 |
Custom SLMs
A private model for the workflow your team repeats.
thetalab trains a compact model in your company RL environment, so the agent learns the exact job before it reaches production.
Custom SLM, enterprise controlled
Model behavior is trained, evaluated, and monitored against your real workflow.
Learns your operation
Tools, approvals, policies, exceptions, and handoff rules.
Runs inside your boundary
Private data, controlled deployment, and auditable behavior.
Improves from evidence
obsrv turns real runs into evals and better training sets.
Best for high-volume workflows where generic agents are too expensive, too variable, or too hard to audit.
SDK
Start training in minutes.
import thetabench
env = thetabench.make("shopify-admin", task_id="prod-001")
obs, info = env.reset()
# gymnasium-compatibleWorks with Stable Baselines, RLlib, and anything that speaks Gymnasium.
Blog(6)
Ready to train your agent?
Get in touch and we'll help you get started with the right environment for your use case.



