- formatting
- images
- links
- math
- code
- blockquotes
- external-services
•
•
•
•
•
•
-
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
-
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
-
HaluMem: Evaluating Hallucinations in Memory Systems of Agents
-
HAD: HAllucination Detection Language Models Based on a Comprehensive Hallucination Taxonomy
-
GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents