Best practices for implementing AI agents: protect them like crown jewels

best-practices-voor-het-implementeren-van-ai-agents-bescherm-ze-als-kroonjuwelen

Published by

WINMAG Pro Editorial Team

Fri, 08 May 2026, 18:00

On March 9, security company CodeWall.ai announced via a demonstration how it hacked McKinsey & Company's AI platform, called Lilli. Lilli is a specially developed system for over 43,000 employees to analyze documents, chat, and access decades of proprietary research. The researchers from CodeWall.ai deployed an AI agent that quickly scanned 200 endpoints, identified 22 that required no authentication, and selected one endpoint where user queries were written to a database, including unparameterized JSON keys that were directly concatenated into SQL.

By Martin Kraemer, CISO Advisor at KnowBe4

This is a classic SQL injection vulnerability that, according to the researchers, would not have been detected by many standard tools by definition. Subsequently, the malicious AI agent gained access to millions of chat messages, hundreds of thousands of files, thousands of user accounts, and over 300,000 AI agents within the database. Moreover, the malicious agent was able to compromise AI model configurations, including system prompts, to bypass security measures. These prompts were stored alongside the data the agent had access to.

If attackers had exploited this SQL injection, they could have easily rewritten these prompts with an UPDATE statement, wrapped in a single HTTP call. The consequences could have been devastating for the organization, as consultants might have relied on output that was subtly altered. Other risks included data theft, removal of guardrails, and silent persistence. All of this was avoided because the researchers responsibly shared their findings with McKinsey, allowing the organization to patch all vulnerabilities.

Why governance is difficult

Gartner predicts that by 2026, 40% of enterprise applications will contain task-oriented AI agents. A PwC survey shows that 79% of surveyed executives already use AI agents within their organization. In another survey, 62% of AI practitioners cited security as one of their top concerns, and 28% of senior executives placed lack of trust in the top three challenges. AI governance is urgently needed to secure agents and restore trust in AI systems.

Governance is challenging because LLMs are inherently opaque. Former OpenAI safety researcher Steven Adler puts it aptly: "You can pull and push to move it in a certain direction, but you can never (at least not yet) say: 'This is why it went wrong.'" These characteristics of large language models make securing AI agents particularly complex. Each agent uses an LLM as a 'brain' to reason, plan, and orchestrate. This means that many of the challenges faced by LLMs also apply to AI agents.

Moreover, intruders do not always need to find a technical vulnerability to exploit AI. LLMs can go off the rails and deliberately avoid guardrails after a long conversation. For instance, a researcher was able to manipulate an AI chatbot to make a legally binding offer to buy a car for 1 dollar. LLMs can also be influenced through prompts hidden in recommendation or summary buttons on websites. In practice, it is difficult to keep track of all the variants of prompt injection attacks that continue to emerge, while AI systems remain vulnerable through multiple attack vectors without the need for classical exploitation. Furthermore, agentic systems do not operate without human interaction and intervention; through prompt injection or social engineering, attackers can manipulate the modern workforce at scale and at machine speed.

The workforce is changing, and the security model must adapt

The Human-AI-Agent workforce is evolving: more autonomy for agents, less human control, insufficient security, and limited oversight. As AI agents like AI assistants, LLM crawlers, and automated browsers increasingly perform work, humans are becoming more 'resources', regardless of whether you build agents yourself or use them within your organization.

Agents do not get tired, do not lose interest, and are not bound by norms or moral considerations. They are tireless and do not give up. They communicate at machine speed and try every possible way to achieve their goals. The traditional security principle of least privilege - limiting access as much as possible - is insufficient to provide adequate guardrails for agents. Organizations must not only determine which systems an agent has access to but also what that agent can do there, which resources are used, and how the agent reasons.

Two principles that should guide every agent deployment

Two principles from the Open Worldwide Application Security Project (OWASP) for agentic applications help address these challenges. 'Least agency' states that agents should not receive more autonomy than the business problem justifies. Agents must be able to perform their tasks without freely navigating irrelevant systems and data or exhausting other system access. 'Least privilege' focuses on access control. While least agency concerns the degree of autonomy (what an agent is allowed to decide and do within a system), the second principle - strong observability - emphasizes the need to see and direct how agents behave within your environment: what they do, why they do it, and which identities and tools they use.

Organizations must act decisively. Control the orchestration layer and create visibility at the boundaries of systems. They must be able to distinguish between humans, simple scripts, and AI agents so that behavior can be tracked as sequences of actions rather than isolated requests. Governance must be linked to observable behavior: when an agent crosses a boundary, you must be able to slow it down, intercept it, or at least challenge it. Developers must ensure that agents are robust enough to handle such interventions and maintain the integrity of legitimate processes.

Auditability must also be built in. Telemetry around agent sessions is essential for investigating incidents: which endpoints were used, which data

was accessed, how decisions were escalated, and how this behavior deviated from that of other agents or humans?

In practice, strong observability without least agency is like a watchdog without teeth. Least agency without strong observability means you are trying to mitigate risks without fully understanding them. You want neither to miss insight into the behavior of agents nor the ability to intervene when necessary.

From principles to practice

These principles must be translated into a layered defense: governance frameworks, policies, and oversight at the management level; training, awareness, and compliance at the human level; and a strong technical layer with countermeasures integrated into both the LLM itself and the broader agent ecosystem, before and during use.

These are essential questions to start with AI agents:

Organizations are already using agents, whether they have built them themselves or not. Do you know which agents you are using?
Securing AI agents revolves around least agency and strong observability. What security measures have you put in place for this?
AI governance will be the key metric for trust. How do you monitor, manage, and secure AI agents?

Every organization that implements or uses AI agents must take these questions seriously to protect their own AI prompts as crown jewels.