- formatting
- images
- links
- math
- code
- blockquotes
- external-services
•
•
•
•
•
•
-
Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations
-
Towards Unbiased Calibration using Meta-Regularization
-
Towards Flash Thinking via Decoupled Advantage Policy Optimization
-
Towards a Unified View of Large Language Model Post-Training
-
Towards a Science of Scaling Agent Systems