Primers • Reasoning in LLMs
References
- SimpleQA: Benchmarking Calibration and Factuality in Short-Form Question Answering
- LongFact: Long-Form Factuality Evaluation with Search-Augmented Generators (SAFE)
- TruthfulQA: Measuring How Models Mimic Human Falsehoods
- HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
- FreshQA: Temporal Reasoning and Factuality in QA Systems
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
- Reflexion: Language Agents with Verbal Reinforcement Learning
- Constitutional AI: Harmlessness from AI Feedback
- FactCC: Factual Consistency Checking for Abstractive Summarization
- ROME: Locating and Editing Factual Associations in GPT
- MEMIT: Mass Editing Memory in a Transformer
- RAGAS: Automated Evaluation for Retrieval-Augmented Generation
- Faithful RAG: Ensuring Factual Consistency in Retrieval-Augmented Generation
- ReFact: Updating Factual Knowledge in Large Language Models
- Tulu 2: A Framework for Truthful Instruction Fine-Tuning
- HELM: Holistic Evaluation of Language Models
- FActScore: Fine-Grained Factuality Evaluation for LLMs
- QAFactEval: Improved Factual Consistency Evaluation for Summarization
- TabFact: A Large-Scale Dataset for Table-Based Fact Verification
- VisualNews: Benchmarking Factual Consistency in Vision-Language Models
- MMRAG: Multimodal Retrieval-Augmented Generation
- Measuring Attribution in Natural Language Generation
- Retrieval-Augmented Continual Learning for Dynamic Knowledge Update
- Factuality of Large Language Models: A Survey
Citation
@article{Chadha2020DistilledInterAnnotatorAgreement,
title = {Inter-Annotator Agreement},
author = {Chadha, Aman and Jain, Vinija},
journal = {Distilled AI},
year = {2020},
note = {\url{https://aman.ai}}
}