Google scientist warns EU data sharing risks privacy

Reuters reports that Sergei Vassilvitskii, a distinguished scientist at Google, sent a written warning to EU antitrust regulators saying the European Commission's proposed requirement to share search data with rivals risked exposing users' private information. Reuters quotes Vassilvitskii saying Google's internal red team was able to re-identify users from anonymised search data in less than two hours. Reporting by Reuters and ITNews says the Commission has proposed giving rivals access to ranking, query, click and view data, and will finalise measures by July 27. Reuters also reports Google offered to collaborate with the Commission on stronger privacy safeguards, while media coverage framed Google's public response as calling the proposal regulatory overreach.
What happened
Reuters reports that Sergei Vassilvitskii, who has the title of distinguished scientist at Google, sent a warning to EU antitrust regulators about the European Commission's proposal to require Google to share search data with rivals such as OpenAI. Per Reuters, Vassilvitskii told regulators in written comments that the Commission's proposed anonymisation method is insufficient and that "We are concerned because the EC's approach to anonymization fails to protect Europeans' privacy: our red team managed to re-identify users in less than two hours,"" according to Reuters. Reuters and ITNews report the Commission is considering access to data types including ranking, query, click and view records, and that measures are due to be finalised by July 27**. Reporting by Reuters also describes Google as offering to collaborate with the Commission on stronger privacy safeguards, and frames parts of the company response as calling the proposal regulatory overreach.
Technical details
Editorial analysis - technical context: The reported re-identification claim points to known classes of privacy attacks, including record linkage and model-assisted inference, where auxiliary signals and behavioural uniqueness make anonymised behavioural logs vulnerable. Industry testing groups and red teams commonly use linkage methods and public or leaked auxiliary datasets to attempt re-identification; rapid successes in controlled tests are possible when datasets contain granular, time-stamped user behaviour. For practitioners, the distinction between removal of direct identifiers and true statistical de-identification matters: techniques like differential privacy, strong aggregation, limited sampling, and strict access controls are the usual mitigations discussed in academic and applied settings.
Context and significance
Industry context
The dispute sits at the intersection of competition policy and data protection. The European Commission is using proposed interoperability and data access rules to widen market entry for rivals, which regulatory coverage links to the Digital Markets Act enforcement toolkit. At the same time, privacy assessments cited in reporting illustrate the tension regulators face balancing competitive access to behavioural telemetry and the risk that modern AI and linkage attacks can reverse naive anonymisation. For data teams and ML engineers, this debate affects what kinds of logs and behavioural datasets could become available to third parties, how access will be governed, and what technical safeguards regulators and companies may require.
What to watch
Editorial analysis: Observers should track the Commission's final text due by July 27, the specific anonymisation standards it accepts, and whether it mandates technical measures such as differential privacy parameters or bounded query interfaces. Watch for follow-up filings or technical appendices from Google, the Commission, or independent auditors that quantify re-identification methods and rates. Industry watchers should also monitor whether regulators require third-party audits, ephemeral access tokens, or legal/contractual constraints on downstream use of shared logs.
Bottom line
Reporting attributes a fast re-identification result to Google's red team and places that finding at the center of an EU regulatory debate over search data sharing. The story highlights practical limits of conventional anonymisation against AI-enabled linkage attacks and frames an active policy decision point where technical specifications will determine how much behavioural data can be shared and under what safeguards.
Scoring Rationale
This story matters because it sits at the intersection of data governance, competition policy, and applied privacy for practitioners. The reported rapid re-identification increases the technical stakes of any mandated data sharing, making the Commission's final anonymisation standards consequential for ML training data and telemetry access.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems


