- formatting
- images
- links
- math
- code
- blockquotes
- external-services
•
•
•
•
•
•
-
Autonomous Agents for Scientific Discovery: Orchestrating Scientists, Language, Code, and Physics
-
Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling
-
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization
-
Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning
-
Artificial Hippocampus Networks for Efficient Long-Context Modeling