- formatting
- images
- links
- math
- code
- blockquotes
- external-services
•
•
•
•
•
•
-
CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
-
CreativityPrism: A Holistic Benchmark for Large Language Model Creativity
-
Cost-Aware Retrieval-Augmentation Reasoning Models with Adaptive Retrieval Depth
-
ConvergeWriter: Data-Driven Bottom-Up Article Construction
-
Controlling changes to attention logits
-
Continual Learning via Sparse Memory Finetuning
-
Connecting Jensen-Shannon and Kullback-Leibler Divergences: A New Bound for Representation Learning
-
Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples
-
Collaboration and Conflict between Humans and Language Models through the Lens of Game Theory
-
Cognitive Foundations for Reasoning and Their Manifestation in LLMs
-
CogGuide: Human-Like Guidance for Zero-Shot Omni-Modal Reasoning
-
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
-
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
-
Causal Reasoning Favors Encoders: On The Limits of Decoder-Only Models
-
Capabilities of GPT-4 on Medical Challenge Problems
-
CAMformer: Associative Memory is All You Need
-
CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling
-
Budget-Aware Tool-Use Enables Effective Agent Scaling
-
BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents
-
BroRL: Scaling Reinforcement Learning via Broadened Exploration
-
BridgeData V2: A Dataset for Robot Learning at Scale
-
BloombergGPT: A Large Language Model for Finance
-
Bias and Fairness in Large Language Models: A Survey
-
Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models
-
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
-
Beyond Two-Stage Training: Cooperative SFT and RL for LLM Reasoning
-
Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window
-
Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI
-
Beyond Patch Aggregation: 3-Pass Pyramid Indexing for Vision-Enhanced Document Retrieval
-
Better World Models Can Lead to Better Post-Training Performance
-
Behind RoPE: How Does Causal Mask Encode Positional Information?
-
BEFT: Bias-Efficient Fine-Tuning of Language Models
-
BEAM: Brainwave Empathy Assessment Model for Early Childhood
-
Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models
-
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
-
BaseReward: A Strong Baseline for Multimodal Reward Model
-
Balanced Actor Initialization: Stable RLHF Training of Distillation-Based Reasoning Models
-
BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data
-
Autonomous Agents for Scientific Discovery: Orchestrating Scientists, Language, Code, and Physics
-
Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling
-
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization
-
Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning
-
Artificial Hippocampus Networks for Efficient Long-Context Modeling
-
ARM-FM: Automated Reward Machines via Foundation Models for Compositional Reinforcement Learning
-
Are Large Language Models Sensitive to the Motives Behind Communication?
-
Are Agents Just Automata? On the Formal Equivalence Between Agentic AI and the Chomsky Hierarchy
-
Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations
-
An Augmentation Overlap Theory of Contrastive Learning
-
AlphaResearch: Accelerating New Algorithm Discovery with Language Models
-
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback