Suno Seeks to Seal Training-data Size in Copyright Case

Suno has asked a federal court to keep the exact total number of audio files it used to train its generative-music model sealed, arguing disclosure would cause competitive harm (MusicBusinessWorldWide; Digital Music News). The request seeks to impound a single figure from filings in the copyright suit brought by UMG and Sony Music (MusicBusinessWorldWide). Suno's CTO, Georg Kucsko, filed a declaration saying the figure "is not publicly available and has not been disclosed outside of this litigation" and warned public disclosure "would risk significant competitive harm to Suno" (MusicBusinessWorldWide). Suno's lawyers say the company is not trying to hide the identity of recordings or the 61,026 additional copyrighted works alleged by the labels to be in its training set (MusicBusinessWorldWide). New York nonprofit Inner City Press has opposed the sealing effort and urged the court to unseal the documents (Digital Music News).
What happened
Suno filed a motion in the US District Court for the District of Massachusetts seeking to impound a single data point in the labels' filings: the total number of audio files allegedly used to train its generative-music model, according to MusicBusinessWorldWide. The filing responds to an unsealing request by New York nonprofit Inner City Press, which urged the court to make the training-data evidence public (Digital Music News). Per the court filing cited by MusicBusinessWorldWide, Suno said it is not seeking to seal the identity of specific recordings or the 61,026 additional copyrighted works that UMG and Sony Music allege were in its dataset.
Technical details
The filing included a declaration from Suno CTO Georg Kucsko, who wrote that the numerical figure "is not publicly available and has not been disclosed outside of this litigation," and that disclosure could reveal "technical development decisions and strategic judgments" about model design and performance, which Suno says would create competitive harm (MusicBusinessWorldWide). The company framed the motion as limited to a single scalar value, the total volume of audio files, rather than the underlying recordings or contracts (Digital Music News; MusicBusinessWorldWide).
Industry context
Editorial analysis: Companies and courts have treated quantitative details about training corpora as commercially sensitive in multiple AI-related litigations and filings. Observed patterns in comparable disputes show defendants often argue that dataset size, composition, or labeling practices can be reverse-engineered to infer model architecture or training regimen, while transparency advocates argue such details matter to rights holders, creators, and researchers.
Context and significance
Editorial analysis: The Suno dispute sits at the intersection of copyright litigation, data-subpoena practice, and the competitive economics of generative models. For practitioners, the outcome could influence how much numerical metadata about training sets is treated as trade secret versus public evidence in future cases. The case also highlights a recurring tension: plaintiffs and the public seek disclosure to assess alleged misuse of copyrighted works, while defendants assert narrow sealing requests on competitive-harm grounds (Digital Music News; MusicBusinessWorldWide).
What to watch
The court's ruling on the impoundment request and any related redactions; whether the judge accepts sealing of a single figure versus broader discovery; filings from Inner City Press and from the labels that further argue public interest versus trade-secret protection (Digital Music News; MusicBusinessWorldWide). Observers should also monitor whether the court's approach aligns with prior decisions referenced in public docket summaries related to AI-training-data secrecy (pacermonitor, CourtListener docket listings).
Scoring Rationale
This is a notable legal development affecting transparency and trade-secret arguments around AI training corpora. The ruling could set precedent for disclosure standards in future AI copyright disputes, making it relevant to practitioners and researchers.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


