- formatting
- images
- links
- math
- code
- blockquotes
- external-services
•
•
•
•
•
•
-
s1: Simple test-time scaling
-
RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training
-
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning
-
RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods
-
RiskPO: Risk-based Policy Optimization via Verifiable Reward for LLM Post-Training
-
RewardDance: Reward Scaling in Visual Generation
-
Reusing Pre-Training Data at Test Time is a Compute Multiplier
-
Retrieval Augmented Generation (RAG) for Fintech: Agentic Design and Evaluation
-
Retrieval-Augmented Generation for Large Language Models: A Survey
-
Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights
-
Rethinking Cross-lingual Gaps from a Statistical Viewpoint
-
Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting
-
ReST-RL: Achieving Accurate Code Reasoning of LLMs with Optimized Self-Training and Decoding
-
Repurposing Synthetic Data for Fine-grained Search Agent Supervision
-
Remote Labor Index: Measuring AI Automation of Remote Work
-
Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution
-
Relative Scaling Laws for LLMs
-
Relative-Based Scaling Law for Neural Language Models
-
Reinforcement learning
-
Reinforcement Learning Meets Large Language Models: A Survey of Advancements and Applications Across the LLM Lifecycle
-
Reinforcement Learning Improves Traversal of Hierarchical Knowledge in LLMs
-
Reinforcement Learning for Machine Learning Engineering Agents
-
Reinforcement Learning Fine-Tuning Enhances Activation Intensity and Diversity in the Internal Circuitry of LLMs
-
Reflexion: Language Agents with Verbal Reinforcement Learning
-
Reflect before Act: Proactive Error Correction in Language Models
-
Reconstructing KV Caches with Cross-layer Fusion For Enhanced Transformers
-
Re4: Scientific Computing Agent with Rewriting, Resolution, Review and Revision
-
RAGs to Riches: RAG-like Few-shot Learning for Large Language Model Role-playing
-
Qwen3-VL Technical Report
-
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
-
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
-
Qwen2 Technical Report
-
Quantitative Bounds for Length Generalization in Transformers
-
Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead
-
QLoRA: Efficient Finetuning of Quantized LLMs
-
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
-
QAgent: A modular Search Agent with Interactive Query Understanding
-
Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism
-
Prune4Web: DOM Tree Pruning Programming for Web Agent
-
Prompts Generalize with Low Data: Non-vacuous Generalization Bounds for Optimizing Prompts with More Informative Priors
-
Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning
-
Process-Supervised Reinforcement Learning for Interactive Multimodal Tool-Use Agents
-
Predicting Task Performance with Context-aware Scaling Laws
-
Pre-training under infinite compute
-
PPTArena: A Benchmark for Agentic PowerPoint Editing
-
PLUM: Adapting Pre-trained Language Models for Industrial-scale Generative Recommendations
-
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
-
Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony
-
Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning
-
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey