Models & Researchmachine learningai researchreinforcement learninghuman in the loop

Humans underweight delayed rewards in dispersed feedback

|June 29, 2026|By LDS Team

5.8

Relevance Score

Humans underweight delayed rewards in dispersed feedback

A peer-reviewed study published June 29, 2026 in PLOS Computational Biology by Miruna Cotet, David Poensgen, and Ian Krajbich found that people put roughly twice as much weight on immediate feedback as on delayed feedback when learning from outcomes split across time, even when both pieces of information were equally informative and all rewards were paid out at the end of the study. Across two experiments with a combined 226 subjects, using behavioral and eye-tracking data, the bias grew stronger as the task progressed and persisted even when participants learned only by observing others' choices. For teams building human-in-the-loop reward or evaluation pipelines, the finding is a caution: human-provided feedback that arrives on a delay may be systematically under-credited relative to immediate feedback, independent of its actual informativeness.

For teams that collect human feedback on a delay, whether for RLHF-style reward modeling, offline evaluation, or preference labeling, this study's core message is that the delay itself distorts how much weight that feedback effectively carries, even when the humans providing it have no rational reason to discount it.

What happened

Cotet, Poensgen, and Krajbich published "Delayed reward information is underweighted in reinforcement learning with dispersed feedback" in PLOS Computational Biology on June 29, 2026 (volume 22, issue 6, e1014459), after the paper circulated as a PsyArXiv preprint. Across two studies (226 total subjects: 87 analyzed in Study 1, 90 in Study 2, after exclusions), participants chose between options where each choice produced two pieces of feedback: one shown immediately, and one shown after the following trial. Both pieces were equally weighted toward the subject's final earnings, and total payout was only revealed at the end of the study, so there was no rational reason to treat immediate and delayed feedback differently. The authors report subjects nonetheless placed roughly 2.4 times more weight on immediate versus delayed feedback in Study 1 and about 1.4 times more in Study 2, and that this immediacy bias grew stronger, not weaker, as subjects gained experience over the course of each experiment.

Technical context

Eye-tracking data showed subjects looked somewhat more at immediate than delayed feedback in aggregate, but individual differences in gaze time did not predict the size of a given subject's behavioral bias, meaning attention alone does not fully explain the effect. The bias also persisted in a passive condition where subjects only observed other people's choices rather than making their own, suggesting it is not purely a byproduct of the agency or motor act of choosing. The authors distinguish this "immediacy bias" for information from classic temporal discounting of rewards themselves, noting it produces objectively worse decisions rather than reflecting a coherent preference for sooner-is-better outcomes.

For practitioners

Systems that blend feedback signals arriving on different timescales, for example combining an immediate automated score with a delayed human review, should not assume both are integrated with equal effective weight by human raters or by models trained to imitate human judgment patterns. Where feasible, the study's authors' framing suggests practitioners consider timestamp-aware weighting or design choices that reduce the temporal spread of diagnostically important feedback, since the underweighting effect measured here grew over time rather than being a fixed, correctable offset.

What to watch

Whether independent replications find similar or larger effect sizes outside controlled lab conditions, and whether follow-up work from the Krajbich Lab or others links the effect more precisely to specific neural or attentional mechanisms; the paper itself notes gaze data only partly explain the behavioral pattern, leaving the underlying mechanism only partially resolved.

Key Points

1Subjects weighted immediate feedback roughly 1.4 to 2.4 times more heavily than equally informative delayed feedback across two experiments.
2The bias was not explained by gaze attention alone and persisted even when subjects only observed others' choices rather than choosing themselves.
3Human-in-the-loop reward and evaluation pipelines that mix immediate and delayed feedback risk systematically under-crediting the delayed signal.

Scoring Rationale

This is now a peer-reviewed, published finding (PLOS Computational Biology, not just a preprint) with a reasonably large combined sample and a concrete, quantified effect size, which supports a solid rather than marginal score. It is scored below the notable band because it is a behavioral-economics/psychology result with indirect, not yet demonstrated, implications for RL or ML systems specifically, rather than a direct AI capability or infrastructure development.

MoreMachine Learning news