2 April 2026

Autonomous Penetration Testing Agents, AWS Security Agent, and the Compliance Question

Jed Kafetz

Head of Commercial Innovation

Why agent-led testing is getting much better, but compliance and assurance still depend on context

On 31 March 2026, AWS made its Security Agent generally available for on-demand penetration testing. It is not another vulnerability scanner. AWS is making a much bigger claim: that an autonomous agent can understand your application, reason about it, test it in context, and report verified findings. If you work in security, you should be paying attention to what that means.

I have spent more than a decade in penetration testing, sit on the CREST UK Council, and have closely followed the development and application of AI since 2022. This piece is my honest take on where agentic penetration testing sits today, what it can realistically do, and what your compliance obligations actually say about it.

Before going further: this is an opinion piece. It is not legal advice, audit guidance, or a substitute for speaking to your own assessor, auditor, regulator, QSAC, or compliance lead. Every business is different. You hold different data, operate under different contractual obligations, and carry different risk profiles. What is appropriate for one organisation may be wholly inadequate for another. The ICO is clear that what counts as “appropriate” depends on your own circumstances and the risks your processing presents.

First, what are these tools?

There is now a growing category of platforms that sit between traditional scanners and traditional human-led penetration tests. The current field includes AWS Security Agent, Horizon3.ai NodeZero, XBOW, Pentera, Escape, and Penligent, among others. They are not all identical. Some lean toward infrastructure validation, some toward attack-path simulation, and some toward web application and API testing. But the broad theme is the same: automation is moving up the value chain from “find obvious issues” to “reason about attack paths and validate exploitability.”

The catalyst for this piece is AWS Security Agent, which reached general availability on 31 March 2026 across six AWS Regions, with multicloud support covering Azure, GCP, and on-premises environments. AWS positions it as a system of specialised AI agents that develop application context from source code, documentation and architecture diagrams, then execute multi-step attack scenarios to identify and validate security vulnerabilities. It combines static analysis, dynamic analysis and penetration testing into a single context-aware agent, and AWS is pricing it at $50 per task-hour with a two-month free trial for new customers. When a hyperscaler of AWS’s scale enters the penetration testing space with that kind of pricing and positioning, the market should take notice.

Why this is different from what came before

Automated infrastructure scanning has been standard practice for at least a decade. Frankly, I would question any penetration tester who thinks they can consistently outperform a mature scanner like Nessus at identifying large volumes of unambiguous, well-understood infrastructure vulnerabilities: package versions, missing patches, weak ciphers, exposed services. External infrastructure is made up of many repeated technical components. While every environment is unique in aggregate, a scanner is extremely effective at recognising “another Apache version” or “another missing patch.” That is exactly the job scanners were built for, and they do it with a speed and precision that no human can match at scale.

Web applications have always been a different story. They are far more bespoke. Even when two businesses use the same framework, the logic, permissions model, workflows, and abuse cases are usually highly specific to that application. Legacy web tools remain excellent, and fuzzing is still powerful, but historically the weak point in automation has been context. The tool could hammer parameters, mutate requests, and uncover classes of issue, but it did not understand what the application was actually for, or what a user should and should not be able to do.

That is where generative AI changes the equation.

When I say context, I mean the business meaning behind the application. Imagine a farming application where each cow has a unique ID and the user browses objects at /cow/1, /cow/2, /cow/3. A traditional scanner sees a parameter to fuzz. It does not know whether the user should be able to access all of those objects or only some of them. A more capable agent can crawl the application, read the pages, infer the object model, understand roles and workflows, and start asking the right question: should this user be able to access this object, modify it, or enumerate all of them?

That is much closer to how a human tester thinks about broken access control, IDOR, BOLA, workflow abuse, and business logic flaws. AWS is explicit about this: Security Agent develops application context from source code, documentation and credentials, then dynamically adapts its attack plan based on what it discovers during testing, including endpoints, status codes, and credentials.

This is why the quality gap between human-led penetration testing and automated testing will continue to narrow. At some point, for some classes of testing, automated systems will likely surpass the average human tester. Are we fully there today? I would not say that with confidence. But the direction is clear. Agents do not sleep, do not get bored, can run continuously, and can revisit your applications every time your code changes. That persistence matters. AWS is openly positioning Security Agent as a way to move testing from a periodic event to an on-demand, continuous capability.

Claranet’s own research: autonomous agents in practice

Claranet is not only observing this technology from the outside — we are actively researching it. Tom Kinnaird, our Head of Security Product Engineering, has been building and testing an autonomous hacking agent over a series of iterations, inspired in part by the achievements of XBOW and similar tools in this space.

Tom’s initial motivation was straightforward: to test what was genuinely possible when using an agent for offensive cybersecurity purposes, and whether it would make an experienced practitioner more effective. He built a framework designed to break down the complex, multi-step tasks involved in identifying and validating vulnerabilities, automate parts of the investigation and iteration process, and enable the agent to work autonomously toward a defined goal. For more information please read his blog on it.

So what does compliance say about automated penetration testing?

This is where the answer becomes less exciting and much more nuanced. I have reviewed the key frameworks and regulatory positions, and the picture is not straightforward.

ISO 27001

ISO/IEC 27001 does not simply say “you must buy an annual penetration test.” The standard is structured around risk management and appropriate controls, not a blanket pentest mandate. Annex A 8.8 concerns the management of technical vulnerabilities, and ISO material is explicit that auditors should focus on whether an organisation meets the requirement, not impose a particular method where the standard does not require one.

My view: low-risk organisations may be able to make a reasonable case, with their auditor, for a heavier reliance on automated and agentic testing as part of a broader vulnerability management programme. But if you process sensitive personal data, operate critical services, or have a complex bespoke application estate, it becomes much harder to argue that autonomous testing alone is sufficient assurance.

UK GDPR and the ICO

Under UK GDPR, the core wording is in Article 32(1)(d), which requires “a process for regularly testing, assessing and evaluating the effectiveness of technical and organisational measures for ensuring the security of the processing.” The ICO’s own guidance says that what those tests look like, and how often you do them, depends on your circumstances.

That is important. UK GDPR does not expressly mandate human-led penetration testing or formal independence in the way some people assume. However, for higher-risk processing, sensitive data, and regulated environments, independent assurance is often still the prudent position. UK GDPR may not force it in black-and-white wording, but your risk posture, contracts, customers, and post-incident defensibility may still point you there.

It is also worth noting that the ICO has taken enforcement action where organisations could not evidence regular penetration testing or vulnerability scanning. That tells you something about where the regulator’s attention falls in practice.

PCI DSS

PCI DSS is much clearer. Requirement 11.4 requires internal and external penetration testing at least every 12 months and after significant changes, and it explicitly says testing must be performed by a qualified internal resource or qualified external third party, with organisational independence of the tester. The PCI SSC penetration testing guidance expands on this by saying the tester must be organisationally separate from the management of the target systems.

That does not automatically mean a machine cannot be part of the process. It does mean that if you are trying to satisfy PCI, you need qualified, independent human accountability around the test.

UK Government, PSN, CHECK, and ITHC

For UK public sector work, the direction is clear. NCSC guidance states that third-party penetration tests should be performed by qualified and experienced staff only, and specifically notes that penetration testing cannot be entirely procedural because the quality of a test is closely linked to the abilities of the testers involved. The NCSC recommends HMG organisations use testers and companies in the CHECK scheme. PSN guidance also requires regular IT Health Checks and says these are commissioned from suppliers under recognised schemes including CREST, CHECK and Tiger.

For these environments, nobody should be suggesting that “the AI did it” will replace accredited human-led assurance any time soon.

Financial services and CBEST

In financial services, the strongest example is CBEST, the Bank of England’s intelligence-led penetration testing framework. It is designed as a targeted assessment framework grounded in realistic threat scenarios and close liaison between firms, regulators and specialist providers. That is not the language of unattended autonomous scanning. It is the language of high-assurance, intelligence-led, controlled human exercise.

Healthcare

Healthcare is mixed. At the broader governance level, NHS DSPT guidance says to consider penetration testing at least annually and vulnerability scanning more often. In more specific onboarding and service contexts, the requirement can become much firmer: for example, NHS guidance for the MESH API requires a penetration test to be completed by a third-party CHECK/CREST accredited organisation before go-live and repeated annually or after security-profile-changing changes.

The summary

Framework / Scheme	Explicitly requires pentesting?	Clearly leans human-led / independent?	My view on agent-led testing
ISO 27001	Not explicitly as a blanket rule	Not explicitly	Could support a risk-based programme, but unlikely to replace human-led assurance for higher-risk estates
UK GDPR / ICO	Requires regular testing of security measures	Not explicitly	Agent-led testing can help, but independence is a risk and assurance question, not a strict GDPR wording point
PCI DSS	Yes	Yes, explicitly qualified and organisationally independent	Agents may assist, but not sufficient on their own for compliance sign-off
NCSC CHECK / HMG	Yes in relevant contexts	Yes, strongly	Human-led accredited testing remains central
PSN / ITHC	Yes in practice for compliance	Yes, through recognised schemes	Agent-led testing may complement, not replace
CBEST	Yes in applicable firms	Yes, strongly	Not a realistic replacement for the framework's assurance intent
NHS DSPT / NHS onboarding	Often expected; sometimes explicitly required	Often yes, especially where CHECK/CREST is specified	Useful adjunct, not a universal substitute

Where agentic penetration testing adds real value

In my view, three areas stand out.

Continuous assurance, in theory. If your application changes several times a day, testing once a year is obviously not enough. As these tools mature, there is a compelling argument that agentic testing could play a meaningful role in continuous security assurance, re-running tests on every material change and closing the gap between releases and security validation. I should be clear: I have not personally assessed or benchmarked these specific tools, so I am speaking to the direction of the technology rather than endorsing any particular product’s current capability. It is also worth noting that continuous security testing as a discipline is broader than any single tool. Effective continuous assurance typically combines automated tooling with human oversight, triage, and contextual judgement, particularly for organisations with complex or regulated estates. The tooling is an enabler, but the programme around it matters just as much.

CI/CD and engineering feedback loops. If agentic testing catches exploitable issues earlier in the development lifecycle, that is good security and good economics. Finding a broken access control issue in staging is dramatically cheaper than finding it in production during your annual pentest.

Broad coverage across portfolios. Most organisations simply do not buy enough manual penetration testing to cover all of their estate with sufficient frequency. Agentic tooling changes that equation. AWS makes this point directly: most organisations limit manual testing to their most critical applications, leaving the majority of their portfolio exposed between tests.

Where human-led testing still matters

I do not think human-led penetration testing is going away soon, especially where any of the following apply: regulated environments; sensitive personal or financial data; public sector or critical national infrastructure; contractual requirements for accredited or independent testing; major releases or significant architectural change; or board-level and audit-level assurance expectations.

My position

My current view is straightforward. Agentic penetration testing is a force multiplier, not yet a universal substitute. In some organisations it may be enough for parts of the programme. In others it will be an excellent complement to human-led testing. In the most heavily regulated sectors, it will improve the programme without replacing the need for an accredited human opinion.

That position may change over time. The technology is moving quickly. But compliance language moves more slowly than technology, and assurance still depends on who is willing to stand behind the result.

If you want to discuss this further, I would genuinely welcome it. Reach out to me at jed.kafetz@claranet.com. I am particularly interested in views from other penetration testing firms, regulators, auditors, QSAs, government bodies, and security leaders.

References

#	Source	Publisher	URL
1	AWS Security Agent on-demand penetration testing now generally available	AWS	https://aws.amazon.com/about-aws/whats-new/2026/03/aws-security-agent-o…;
2	AWS Security Agent on-demand penetration testing now generally available (blog)	AWS Security Blog	https://aws.amazon.com/blogs/security/aws-security-agent-on-demand-pene…;
3	New AWS Security Agent secures applications proactively from design to deployment (preview)	AWS News Blog	https://aws.amazon.com/blogs/aws/new-aws-security-agent-secures-applica…;
4	Inside AWS Security Agent: A multi-agent architecture for automated penetration testing	AWS Security Blog	https://aws.amazon.com/blogs/security/inside-aws-security-agent-a-multi…;
5	AWS Security Agent product page	AWS	https://aws.amazon.com/security-agent/
6	AWS Security Agent User Guide: Create a penetration test	AWS Documentation	https://docs.aws.amazon.com/securityagent/latest/userguide/perform-pene…;
7	AWS penetration testing policy	AWS	https://aws.amazon.com/security/penetration-testing/
8	UK GDPR Article 32	legislation.gov.uk	https://www.legislation.gov.uk/eur/2016/679/article/32
9	A guide to data security	ICO	https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-…;
10	BNT reprimand (ICO enforcement)	ICO	https://ico.org.uk/action-weve-taken/enforcement/bnt/
11	Advanced Computer Software Group Limited penalty notice	ICO	https://ico.org.uk/action-weve-taken/enforcement/advanced-computer-soft…;
12	Payment Card Industry Data Security Standard v4.0.1	PCI SSC	https://www.pcisecuritystandards.org/document_library/
13	PCI SSC Penetration Testing Guidance v1.1	PCI SSC	https://www.pcisecuritystandards.org/document_library/
14	Penetration testing guidance	NCSC	https://www.ncsc.gov.uk/guidance/penetration-testing
15	CHECK penetration testing	NCSC	https://www.ncsc.gov.uk/information/check-penetration-testing
16	Assured CHECK Scheme Standard v1.1	NCSC	https://www.ncsc.gov.uk/information/check-penetration-testing
17	Apply for a PSN compliance certificate	GOV.UK	https://www.gov.uk/guidance/apply-for-a-public-services-network-psn-com…;
18	IT Health Check (ITHC): supporting guidance	GOV.UK	https://www.gov.uk/government/publications/it-health-check-ithc-support…;
19	CBEST Threat Intelligence-Led Assessments	Bank of England	https://www.bankofengland.co.uk/financial-stability/financial-sector-co…;
20	CBEST Implementation Guide	Bank of England / PRA	https://www.bankofengland.co.uk/financial-stability/financial-sector-co…;
21	2024 CBEST thematic	Bank of England	https://www.bankofengland.co.uk/financial-stability/financial-sector-co…;
22	NHS DSPT penetration testing guidance	NHS Digital	https://www.dsptoolkit.nhs.uk/
23	NHS cloud security good practice guide, Appendix A	NHS Digital	https://digital.nhs.uk/services/cloud-centre-of-excellence
24	Security best practices for AWS Security Agent	AWS Documentation	https://docs.aws.amazon.com/securityagent/latest/userguide/security-bes…;
25	What I learnt hacking with AI over a weekend	Claranet	https://www.claranet.com/uk/blog/what-i-learnt-hacking-ai-over-weekend/…;