- formatting
- images
- links
- math
- code
- blockquotes
- external-services
•
•
•
•
•
•
-
Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models
-
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
-
BaseReward: A Strong Baseline for Multimodal Reward Model
-
Balanced Actor Initialization: Stable RLHF Training of Distillation-Based Reasoning Models
-
BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data