ActiveUltraFeedback & RewardUQ

Two recent papers on RLHF for large language models: ActiveUltraFeedback on efficient preference data generation and RewardUQ on uncertainty-aware reward modeling. Use the cards below to navigate to the full project pages.

Active Learning for Preference Data Generation

ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning

ActiveUltraFeedback is a modular pipeline for collecting high-quality preference data more efficiently. By using uncertainty-aware response selection, it identifies informative response pairs for annotation and can achieve strong downstream performance with substantially fewer labels than static baselines.

Project Page Paper (arXiv) GitHub Code BibTeX

Uncertainty Quantification for Reward Models

RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models

RewardUQ provides a unified framework for evaluating uncertainty quantification in reward models. It compares common methods in terms of accuracy and calibration, helping clarify which design choices matter most for reliable reward modeling in LLM alignment.

Project Page Paper (arXiv) GitHub Code BibTeX