Startups Race to Build Complex RL Environments Amid Reward-Hacking Warnings
Startups and data-labeling giants are racing to build reinforcement-learning (RL) environments — simulated workspaces that let AI agents practice multi-step tasks like using a browser or enterprise software. Firms from Surge and Mercor to newcomers Mechanize Work and Prime Intellect aim to supply labs hungry for these costly, complex simulations. Investors are excited but researchers warn reward-hacking and scaling challenges could limit how far environments drive next-gen agent progress.