Big Data News Hubb
Advertisement
  • Home
  • Big Data
  • News
  • Contact us
No Result
View All Result
  • Home
  • Big Data
  • News
  • Contact us
No Result
View All Result
Big Data News Hubb
No Result
View All Result
Home Big Data

Scaling laws for reward model overoptimization

admin by admin
March 18, 2023
in Big Data


In reinforcement learning from human feedback, it is common to optimize against a reward model trained to predict human preferences. Because the reward model is an imperfect proxy, optimizing its value too much can hinder ground truth performance, in accordance with Goodhart’s law. This effect has been frequently observed, but not carefully measured due to the expense of collecting human preference data. In this work, we use a synthetic setup in which a fixed “gold-standard” reward model plays the role of humans, providing labels used to train a proxy reward model. We study how the gold reward model score changes as we optimize against the proxy reward model using either reinforcement learning or best-of-n sampling. We find that this relationship follows a different functional form depending on the method of optimization, and that in both cases its coefficients scale smoothly with the number of reward model parameters. We also study the effect on this relationship of the size of the reward model dataset, the number of reward model and policy parameters, and the coefficient of the KL penalty added to the reward in the reinforcement learning setup. We explore the implications of these empirical results for theoretical considerations in AI alignment.



Source link

Previous Post

Align, Engage and Rave: 3 Things I Wish I Knew as Chief Data Officer

Next Post

How Does the Internet of Things Help Businesses?

Next Post

How Does the Internet of Things Help Businesses?

Recommended

Implementing and Using UDFs in Cloudera SQL Stream Builder

February 24, 2023

Impact of Marketing Strategy Automation on Customer Experience

October 16, 2022

Analyze Amazon S3 storage costs using AWS Cost and Usage Reports, Amazon S3 Inventory, and Amazon Athena

February 3, 2023

Don't miss it

News

How Enterprises Can Defray the Hidden Cost of the Cloud

March 23, 2023
Big Data

Evolution through large models

March 23, 2023
Big Data

Observe Everything – Cloudera Blog

March 22, 2023
Big Data

NVIDIA Launches Inference Platforms for Large Language Models and Generative AI Workloads

March 22, 2023
Big Data

Announcing the General Availability of Private Link and CMK for Databricks on AWS

March 22, 2023
News

Manage users and group memberships on Amazon QuickSight using SCIM events generated in IAM Identity Center with Azure AD

March 22, 2023

big-data-footer-white

© 2022 Big Data News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Big Data
  • News
  • Contact us

Newsletter Sign Up

No Result
View All Result
  • Home
  • Big Data
  • News
  • Contact us

© 2022 Big Data News Hubb All rights reserved.