
The Self-Rewriting Agent: Deploying Models That Learn Their Own Rules

DAP Explained: Joint Scene–Action Prediction with Discrete Tokens

Bi-Level Contextual Bandits: Fair Resource Allocation When Feedback Is Delayed

Beyond Self-Play: Training Robust Agents with Rational Policy Gradient

AsyncThink: Teaching LLMs to Organize Their Own Thinking
