Security & Riskgrabagentic aikubernetesruntime security

Grab Builds Palana Secure Agentic Workload Platform

|
6.3
Relevance Score
Grab Builds Palana Secure Agentic Workload Platform
Photo: cdn.infoq.com · rights & takedowns

According to Grab's engineering blog, Grab's CyberSecurity team built Palana, a Kubernetes-native platform for running autonomous AI agents in isolated, auditable environments. Each agent receives a dedicated namespace with restrictive RBAC, custom network policies, and persistent storage. Palana applies a proxy-mediated credentials model: agents never receive raw secrets directly - instead proxy-only tokens are replaced by an Envoy-based egress proxy backed by Vault, keeping credentials out of the agent runtime. The platform includes kill switches and idle shutdown enforced from outside the agent process. Grab reports Palana currently runs hundreds of agents, including OpenClaw workers, Hermes agents, and Slack automations, serving as an internal self-service substrate for agentic workloads.

What happened

Grab's CyberSecurity team published a detailed design overview of Palana, the company's Kubernetes-native platform for running autonomous AI agents safely. Per Grab's engineering blog, the need originated in security research: the team wanted a contained environment to run OpenClaw and related agent frameworks without exposing the internal network or placing raw credentials inside the agent runtime. Palana now runs hundreds of agents at Grab, including OpenClaw workers, Hermes agents, Matlock and Butler task agents, Slack automations, and cloud development environments.

Technical architecture

Palana's core design principle is that isolation is the unit of trust. Per the Grab blog, each agent receives a dedicated Kubernetes namespace with role-based access control, resource quotas, custom network policies, isolated service accounts, and a persistent /data volume for long-running state. The platform adds pcli (a command-line interface) and a web portal for creating, running, stopping, and inspecting agents.

Critically, Palana separates two classes of secrets via Vault integration: agent-readable secrets (available only to that agent's service account) and proxy-only secrets, where the agent sees a placeholder token such as TOKEN_GITHUB_PAT or TOKEN_GRABGPT_API_KEY. Outbound HTTP/HTTPS traffic flows through an Envoy egress proxy with ext-authz and Open Policy Agent (OPA) checks - the proxy replaces placeholder headers with real Vault credentials before forwarding requests, so the agent process never holds live tokens. LLM access is routed through a LiteLLM wrapper that injects per-agent GrabGPT credentials from Vault.

Operational controls deliberately live outside the agent

a kill switch removes the agent's network path rather than asking the agent to cooperate, and a reaper CronJob handles idle shutdown. This design assumes agents may become confused, compromised, or unresponsive.

Industry context

Palana represents a detailed, production-proven implementation of patterns increasingly discussed for agentic AI: per-workload namespace isolation, proxy-mediated credential injection, and external kill switches. The proxy-only secrets pattern addresses a class of risk specific to agentic workloads where prompt injection or dependency compromise could expose long-lived tokens if they were present in the agent environment. InfoQ also covers the platform and frames it as part of a broader shift toward specialized infrastructure for agentic AI.

What to watch

Grab notes this is Part 1 of a series; Part 2 will cover LLM routing, audit/logging details, and agent lifecycle management in depth. Practitioners should watch for open-source releases or operator guides from Grab or similar teams adopting comparable patterns.

Scoring Rationale

This is a detailed, production-proven engineering design post from a major Southeast Asian tech platform, valuable for practitioners building or securing agentic workloads. The proxy-only credential pattern and external kill-switch design are practically useful reference points. Score reflects solid practitioner value from a single-company engineering blog post - notable but not paradigm-shifting for the broader AI field.

Practice with real Ride-Hailing data

90 SQL & Python problems · 15 industry datasets

250 free problems · No credit card

See all Ride-Hailing problems