Dataiku releases Kiji Privacy Proxy to mask PII
Per Dataiku's blog, Dataiku released Kiji Privacy Proxy™, an open-source local gateway that detects and masks personally identifiable information (PII) before requests leave the network. Reporting by ITSecurityNews and Dataiku describe the proxy as sitting between local applications and external AI APIs such as OpenAI and Anthropic, running ML-powered PII detection and replacing emails, phone numbers, credit card numbers, SSNs and more with realistic dummy values, then restoring originals when responses return. Per Dataiku's blog, the tool covers 16+ PII types and is intended to allow enterprises to use external generative AI without exposing raw PII. The blog also links privacy risk to regulations including GDPR, HIPAA, and CCPA, and cites a survey of 600 CIOs reporting 85% have seen AI projects delayed by traceability or explainability gaps.
What happened
Per Dataiku's blog, Dataiku released Kiji Privacy Proxy™, an open-source local gateway that detects and masks personally identifiable information (PII) before requests leave an enterprise network. Reporting by ITSecurityNews indexes and summarizes the release. The proxy is described as sitting between local applications and external AI APIs such as OpenAI and Anthropic, intercepting or forwarding requests, running an ML-powered PII detection model, replacing identified values with realistic dummy data, and restoring originals on the inbound response. Per Dataiku's blog, Kiji handles 16+ PII types and the company frames the feature as a way to avoid sending raw PII to third-party LLMs. The blog also cites regulatory risk, naming GDPR, HIPAA, and CCPA, and references a survey of 600 CIOs that found 85% reported AI projects delayed or blocked entirely due to gaps in traceability or explainability.
Editorial analysis - technical context
Companies building proxy-based PII masking typically combine named-entity recognition, pattern detectors (regex), and deterministic pseudonymization to preserve conversational context while removing direct identifiers. Observed patterns in similar systems include tradeoffs between masking fidelity and model utility, the need for reversible pseudonym mapping to reinsert originals, and latency introduced by in-line inspection. For practitioners, common operational concerns are model drift in entity detectors, consistency of dummy-value substitution across sessions, key management for mappings, and maintaining audit logs without reintroducing sensitive data.
Industry context
Industry observers note enterprises face regulatory and contractual constraints when prompts contain customer data, making in-network sanitization one of several approaches alongside on-prem or private-cloud model hosting and strict data-minimization policies. Open-source privacy proxies lower integration friction compared with bespoke solutions, but they do not eliminate the broader compliance and security workstreams around data governance, access controls, and logging.
What to watch
Adoption signals to follow include the Kiji repository activity and contributions, interoperability with popular LLM SDKs and API patterns, third-party security audits of the masking logic, and benchmarks showing the impact on model output quality and latency. Observers should also watch for vendor integrations or forks that extend detection coverage or add enterprise management features.
Scoring Rationale
This is a notable engineering and privacy tooling release relevant to enterprise ML deployments. It eases a common integration gap but is not a frontier-model or regulatory leap. Practitioners will evaluate detection fidelity, performance, and compliance fit.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
