- formatting
- images
- links
- math
- code
- blockquotes
- external-services
•
•
•
•
•
•
-
How Does RL Post-training Induce Skill Composition? A Case Study on Countdown
-
Higher-order Linear Attention
-
Higher Embedding Dimension Creates a Stronger World Model for a Simple Sorting Task
-
HEAL: A Hypothesis-Based Preference-Aware Analysis Framework
-
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents