- formatting
- images
- links
- math
- code
- blockquotes
- external-services
•
•
•
•
•
•
-
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
-
Inefficiencies of Meta Agents for Agent Design
-
Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement
-
In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs
-
Improving Online Algorithms via ML Predictions
-
Improving Context Fidelity via Native Retrieval-Augmented Reasoning
-
Improved Baselines with Visual Instruction Tuning
-
Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
-
ImageBind: One Embedding Space To Bind Them All
-
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights
-
HunyuanVideo 1.5 Technical Report
-
HPLT 3.0: Very Large-Scale Multilingual Resources for LLM and MT. Mono- and Bi-lingual Data, Multilingual Evaluation, and Pre-Trained Models
-
How Far Are We from Genuinely Useful Deep Research Agents?
-
How Does RL Post-training Induce Skill Composition? A Case Study on Countdown
-
Higher-order Linear Attention
-
Higher Embedding Dimension Creates a Stronger World Model for a Simple Sorting Task
-
HEAL: A Hypothesis-Based Preference-Aware Analysis Framework
-
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents
-
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
-
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
-
HaluMem: Evaluating Hallucinations in Memory Systems of Agents
-
HAD: HAllucination Detection Language Models Based on a Comprehensive Hallucination Taxonomy
-
GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents
-
Group Representational Position Encoding
-
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
-
GPT-4o System Card
-
GPT-4 Technical Report
-
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
-
Generative Models for Synthetic Data: Transforming Data Mining in the GenAI Era
-
Generative Early Stage Ranking
-
Generative Data Refinement: Just Ask for Better Data
-
Generative AI
-
General Agentic Memory Via Deep Research
-
Gemma 2: Improving Open Language Models at a Practical Size
-
GEM: A Gym for Agentic LLMs
-
GatePro: Parameter-Free Expert Selection Optimization for Mixture-of-Experts Models
-
GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters
-
From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory
-
Fourier Neural Operators Explained: A Practical Perspective
-
ForTIFAI: Fending Off Recursive Training Induced Failure for AI Models
-
FlowRL: Matching Reward Distributions for LLM Reasoning
-
FLEX: Continuous Agent Evolution via Forward Learning from Experience
-
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
-
Fast attention mechanisms: a tale of parallelism
-
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
-
Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions
-
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
-
F -- A Model of Events based on the Foundational Ontology DOLCE+DnS Ultralite
-
Extracting alignment data in open models
-
Explaining the Success of Nearest Neighbor Methods in Prediction