Rubrics as Rewards: A New Paradigm for Training Reliable AI AI models face significant challenges when applied to nuanced, high-stakes fields like medicine and science. Standard training techniques, such as Reinforcement Learning from Human Feedback (RLHF), of... AI safety AI training expert guidance language models model evaluation RLHF rubrics