Researchllmsummarizationinformation retrieval

Summarization Alters LLM Relevance Judgment Reliability

|December 8, 2025|By LDS Team

7.0

Relevance Score

Summarization Alters LLM Relevance Judgment Reliability

Samaneh Mohtadi (submitted Dec. 5, 2025) investigates how text summarization affects LLM-based relevance judgments for IR. Using state-of-the-art LLMs across multiple TREC collections, the study compares full-document judgments with LLM-generated summaries of varying lengths, measuring agreement with human labels and effects on retrieval evaluation. It finds summary-based judgments preserve system-ranking stability but introduce systematic label shifts and model/dataset-dependent biases.

Key Points

1Demonstrates that LLM judgments from summaries match system-ranking stability of full-document judgments
2Identifies systematic label distribution shifts and model/dataset-dependent biases introduced by summarization
3Warns practitioners to validate summary length and model choice to avoid misleading IR evaluation results

Scoring Rationale

Methodological insight with direct evaluation implications; limited novelty and single preprint source constrain broader impact.

Sources

Public references used for this report.

1 source

01arxiv.org[2512.05334] The Effect of Document Summarization on LLM-Based Relevance Judgments

Practice with real Logistics & Shipping data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

High-Value Overnight OrdersEasy

Delivered International ShipmentsMedium

On-Time Delivery Rate by CarrierHard

250 free problems · No credit card

See all Logistics & Shipping problems

Researchllmsummarizationinformation retrieval

Summarization Alters LLM Relevance Judgment Reliability

|December 8, 2025|By LDS Team

7.0

Relevance Score

Key Points

1Demonstrates that LLM judgments from summaries match system-ranking stability of full-document judgments
2Identifies systematic label distribution shifts and model/dataset-dependent biases introduced by summarization
3Warns practitioners to validate summary length and model choice to avoid misleading IR evaluation results

Scoring Rationale

Methodological insight with direct evaluation implications; limited novelty and single preprint source constrain broader impact.

Sources

Public references used for this report.

1 source

01arxiv.org[2512.05334] The Effect of Document Summarization on LLM-Based Relevance Judgments

Practice with real Logistics & Shipping data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

High-Value Overnight OrdersEasy

Delivered International ShipmentsMedium

On-Time Delivery Rate by CarrierHard

250 free problems · No credit card

See all Logistics & Shipping problems

Summarization Alters LLM Relevance Judgment Reliability

Key Points

Scoring Rationale

Sources

More AI & Data Science News

Overland AI Secures $19.7M Marine Corps Contract

AI Industry Creates New Age of Imperial Extraction

Preity Zinta Seeks Court Orders to Remove AI Deepfakes

AI-driven rotation reshapes stock market leadership

Summarization Alters LLM Relevance Judgment Reliability

Key Points

Scoring Rationale

Sources

More AI & Data Science News

Overland AI Secures $19.7M Marine Corps Contract

AI Industry Creates New Age of Imperial Extraction

Preity Zinta Seeks Court Orders to Remove AI Deepfakes

AI-driven rotation reshapes stock market leadership