AWS Demonstrates Trajectory-Aware Post-Training for Security Agents
At [un]prompted 2026, Aaron Brown (AWS) presented an open-source pipeline for trajectory-aware post-training of open-weight SLMs to build security agents. The talk delivers a practitioner-focused workflow: instrumenting environments to capture agent trajectories, curating trajectory datasets, and applying post-training objectives that condition models on multi-step agent behavior rather than single-turn prompts. The pipeline includes tooling for environment setup, dataset construction, fine-tuning, and evaluation harnesses tailored to red-team and defensive cybersecurity tasks. The contribution matters because it operationalizes agent-centric training with publicly available weights, lowering the barrier for reproducible security research while raising dual-use considerations for defensive and offensive tooling.
What happened
At [un]prompted 2026 Aaron Brown, Agentic AI Builder at AWS, presented a talk titled Trajectory-Aware Post-Training of Open-Weight Models for Security Agents and published an open-source pipeline that operationalizes post-training on model trajectories for cybersecurity tasks. The presentation emphasizes practical steps: environment instrumentation, trajectory capture, dataset curation, post-training (fine-tuning) on multi-step behavior, and specialized evaluation for agentic security scenarios.
Technical details
The core idea is trajectory-aware post-training, which moves beyond single-turn fine-tuning by conditioning model updates on sequences of agent-environment interactions (trajectories). Brown framed this as a post-training stage applied to open-weight SLMs-models with accessible parameters that practitioners can adapt. The pipeline components described and demonstrated include:
- •Environment instrumentation to log state, actions, observations, and metadata for agent runs
- •Dataset curation to convert raw interaction logs into trajectory examples and context windows suitable for sequence modeling
- •Post-training procedures that reframe the objective to predict next actions or next observations within trajectory contexts rather than isolated completions
- •Evaluation harnesses and red-team scenarios to measure robustness, correctness, and safety of agentic behaviors
The implementation is explicitly open-source and includes reproducible environment setup, sample datasets, and evaluation scripts. Brown highlighted practical engineering considerations: deterministic logging for reproducibility, bounding context windows to avoid catastrophic forgetting during post-training, and integrating runtime monitors to intercept unsafe or undesirable agent actions.
Context and significance
This talk intersects two growing trends in applied AI security. First, the move from model-centric prompts to agentic, stateful systems makes sequential behavior the primary evaluation surface. Second, availability of open-weight models democratizes experimentation but increases both defensive capability and dual-use risk. By providing a documented pipeline, the work closes a gap between conceptual agent research and operational security tooling. For practitioners, the important shift is methodological: treat agent behavior as first-class training data and adopt trajectory-centric objectives instead of relying solely on prompt engineering or single-turn fine-tuning like classic supervised fine-tuning and RLHF.
Why it matters for practitioners
Implementing trajectory-aware post-training changes how you collect data, how you instrument environments, and how you evaluate models under adversarial conditions. The pipeline lowers friction for reproducible research and for integrating agentic models into security workflows, such as automated vulnerability hunting, simulated attack/defense exercises, and blue-team automation. It also forces teams to reckon with monitoring, access controls, and governance because the same techniques that improve defensive agents can amplify offensive capabilities when misused.
What to watch
Expect community forks that extend the pipeline to additional simulators and to structured environment APIs (e.g., CTF-style envs, network simulators). Look for benchmarking suites that compare trajectory-aware post-training against offline RL, behavioral cloning, and sequence-level RLHF on security tasks. Operational teams should evaluate access policies for open-weight models and integrate runtime safeguards into any production agent pipeline.
Scoring Rationale
This is a notable practitioner contribution: an open-source, reproducible pipeline that codifies trajectory-aware post-training for security agents. It meaningfully affects how teams collect data and evaluate agentic models, but it is not a frontier model release or paradigm-shifting paper.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.



