“We Did a Pentest” No Longer Satisfies Medical Device Regulators

Summary

Penetration testing has long been a cornerstone of medical device cybersecurity, but regulatory expectations have outgrown the legacy, report-centric model. Drawing on more than a decade of exclusive focus on medical device security and hundreds of FDA-reviewed submissions, this post explains why single, point-in-time penetration tests are no longer sufficient to demonstrate cybersecurity diligence. Regulators are shifting their focus from isolated findings to completeness, traceability, and defensibility of cybersecurity testing across the full product lifecycle. We outline how traditional, human-driven penetration testing—while still essential—must now be repositioned as a targeted validation activity within a broader, test-case-driven framework that integrates architecture modeling, vulnerability data, and continuous evidence generation. Finally, we describe why this approach cannot scale manually and how platforms like ELTON enable manufacturers to meet modern regulatory expectations with predictable, defensible, and continuous cybersecurity assurance.

Introduction

For more than a decade, we have had a singular focus on cybersecurity testing of medical device targets. During this time, our teams have executed thousands of medical device security assessments, including more than 600 penetration tests that directly supported global regulatory submissions for devices ultimately approved for market release. Our work and the evolution of our testing methodology have been extensively reviewed by regulators, including the FDA, in response to their feedback and requests.

Our exclusive focus on medical device security over the past decade has resulted in what industry peers have described as the largest volume of medical device penetration testing performed by a single provider. These engagements have ranged from global manufacturers with large and diverse product portfolios to early-stage startups bringing their first device to market. Through this work, we have evaluated nearly every class of device architecture, deployment model, and operational environment in use today, from the world’s largest systems, such as proton therapy platforms, to the smallest implantable devices.

Our reputation has been built on what many would describe as traditional or “old-school” penetration testing: deep-dive, human-driven assessments focused on uncovering real vulnerabilities, including zero-day issues that cannot be identified through automated tooling alone. This form of testing remains critical and valuable, yet in today’s market it is increasingly difficult to distinguish from lower-cost offerings that lack equivalent depth or expertise.

At the same time, within the medical device industry, we have observed clear and accelerating shifts in both regulatory expectations and customer needs. While penetration testing remains necessary, the historical model is no longer sufficient on its own. Over the last several years, it has become increasingly evident where penetration testing delivers its greatest value, where it falls short, and what additional forms of cybersecurity testing are now required to achieve consistent regulatory success.

This post reflects those observations. We aim to transparently describe what we have seen across the penetration testing industry and the medical device ecosystem, why long-standing practices of single penetration tests are beginning to fail under regulatory scrutiny, and how manufacturers can best prepare for what lies ahead. We no longer believe these challenges are solvable manually or with spreadsheets, and the history that follows explains why.

Overview

Penetration testing has been a foundational cybersecurity practice for more than two decades. It originated as an external, network-focused adversarial exercise intended to demonstrate real-world exploitability of enterprise systems exposed to the internet. Over time, regulators, customers, and security teams increasingly treated penetration testing as a proxy for overall cybersecurity posture.

That model no longer holds for modern, regulated products, particularly medical devices. Regulatory expectations are shifting away from isolated demonstrations of compromise and toward completeness, traceability, and defensibility of cybersecurity testing. Regulators now expect manufacturers to demonstrate what was tested, why it was tested, how testing maps to product architecture and security controls, and why vulnerabilities were or were not identified. Penetration testing is still required, but it represents only a relatively small component of the broader cybersecurity testing evidence now expected to convey assurance. A short list of vulnerabilities from a penetration test is no longer reassuring. Without evidence of comprehensive testing, it is increasingly interpreted as incomplete.

This post explains why legacy penetration testing is increasingly failing to meet modern regulatory expectations. It traces the historical origins of penetration testing, examines its structural limitations when applied to products, and outlines a comprehensive cybersecurity testing methodology aligned with current regulatory expectations. It concludes by describing how organizations can operationalize this approach in a scalable and economically viable manner without relying on manual, report-centric processes.

Origins of Penetration Testing

Penetration testing emerged more than twenty years ago to address enterprise network security at a time when cybersecurity concerns were almost entirely network-centric. This history explains why many of the most capable penetration testers originated from strong networking backgrounds. The original objective was narrow and explicit: attempt to penetrate an organization’s external network perimeter, demonstrate impact, and stop. The goal was never to enumerate all vulnerabilities or to provide a complete security assessment. Success was defined by finding a way in, not by identifying every weakness that existed.

Early penetration testing focused on externally exposed network services such as open ports, weak authentication mechanisms, misconfigured servers, anonymous file sharing, and insecure protocols. As firewalls became widespread and perimeter defenses improved, testing shifted inward. With the rise of phishing and other social engineering techniques, internal penetration testing emerged, emphasizing privilege escalation, lateral movement, and unauthorized access to systems or data from an assumed internal foothold.

Although tactics, techniques, and procedures evolved over time, the fundamental output of penetration testing did not. Penetration tests remained large-scope, time-boxed exercises designed to demonstrate plausible attacker impact. Because the scope often included entire enterprise networks, it was reasonable to assume that testing would not cover every system, configuration, or pathway. An implicit assumption underpinned the practice: it is impossible to find all vulnerabilities, and therefore penetration testing focuses on the most severe and exploitable ones.

That assumption is reasonable for enterprise environments. It becomes a fundamental limitation when penetration testing is applied to smaller, well-defined targets that could reasonably be evaluated more comprehensively.

From Networks to Applications to Cloud to Products

As technology evolved, penetration testing remained the same.

The rise of web applications significantly reduced exposed enterprise network surfaces. Internet-facing attack surfaces collapsed into a limited number of web applications and VPN endpoints. Web application penetration testing represented the first meaningful departure from traditional network testing because completeness became more achievable. Scope was smaller, more deterministic, and more repeatable. Even so, success continued to be defined by identifying high-impact vulnerabilities rather than by demonstrating full coverage of the application’s functionality and attack surface.

Mobile applications and bring-your-own-device environments further constrained scope. Operating system security controls on platforms such as iOS and Android significantly reduced accessible attack surfaces, pushing testing toward APIs, authentication flows, and backend services. Testing became narrower and more repeatable, but explicit completeness criteria were still largely absent. Customer expectations often remained focused on identifying high or critical findings rather than understanding overall coverage.

Cloud infrastructure shifted penetration testing even further toward configuration assessment. While cloud misconfigurations can be severe, the range of possible failure modes is finite and well understood. In theory, completeness became measurable and testable.

More recently, enterprise security programs have emphasized red teaming. Red teaming further narrows objectives by prioritizing proof of access or impact at any cost, often incorporating social engineering and physical testing. While valuable for enterprise resilience, red teaming produces even smaller and less complete outputs.

This evolution matters because products, especially medical devices, have smaller and more deterministic scopes than enterprises. Yet penetration testing objectives have not evolved to match this reality and, in many cases, have continued to reduce coverage at the exact moment regulators are demanding more.

Variability and the Limits of Human-Driven Testing

Penetration testing is inherently human-driven. Its effectiveness depends on the skill, experience, creativity, and intuition of individual testers. This introduces unavoidable variability.

Five firms testing the same target can produce five different sets of vulnerabilities, sometimes with little overlap. Decisions about what to test and how deeply to test are largely human and rarely standardized. Pricing directly influences duration, which in turn constrains coverage and depth. There is a clear correlation between cost and quality driven by the scarcity and expense of senior expertise.

Senior penetration testers are expensive, and for good reason. Expanding penetration testing to achieve the level of completeness required for medical devices would make it economically infeasible, particularly for postmarket products where testing is expected to occur on a recurring basis. In practice, many findings exist simply because a tester decided a particular area was worth exploring. This approach can scale when vendors apply disciplined scoping and deep expertise, but it is inconsistent across the market.

Regulators and customers have attempted to compensate for this variability by requesting tester certifications, qualifications, and declared test durations. While these signals provide limited assurance, they attempt to quantify human skill, which is inherently difficult. Some of the most capable penetration testers operate outside formal certification structures and produce results that do not align with standardized expectations.

This variability is not a defect of penetration testing. It is intrinsic to the practice and, in many contexts, the reason it is valuable. However, it makes penetration testing unsuitable as a primary mechanism for demonstrating cybersecurity posture in regulated product environments.

Why Medical Device Breaks the Legacy Penetration Testing Model

Medical devices are bespoke cyber-physical products with defined architectures, limited interfaces, and long commercial lifecycles. Unlike enterprises, manufacturers can and should be expected to understand and test the full product attack surface. It’s possible, so why are we not doing it? The prevailing industry assumption is that penetration testing is sufficient. While the FDA does reference penetration testing in its guidance, it also calls for additional forms of testing and evidence, and recent regulatory interactions show those requirements are now being actively scrutinized.

Traditional penetration testing works against this goal. Skilled testers optimize time by focusing on areas most likely to yield severe findings. Medium- and low-severity issues are often deprioritized due to time constraints and limited customer appetite for remediation unless issues are high or critical. Entire portions of scope may be intentionally left unexplored if they are unlikely to produce demonstrable impact. This behavior aligns with the original purpose of penetration testing but conflicts with modern regulatory expectations.

For products entering the market and expected to remain in use for many years, particularly during premarket evaluation, this approach is insufficient. The resulting report reflects tester intuition and time allocation rather than completeness. Two equally competent testers may produce materially different outputs, neither of which can credibly claim to represent all known vulnerabilities in the product.

This is not a failure of testers or methodology. It is a mismatch between industry practice and regulatory need.

Regulatory Shift From Findings to Completeness

Historically, manufacturers conducted cybersecurity testing during development with the goal of reaching a final release that contained few remaining vulnerabilities. A low vulnerability count was interpreted as a successful outcome and aligned with expectations under security development frameworks that emphasize early identification and remediation.

Today, that interpretation has changed. A report with few or no vulnerabilities now prompts a different question: was the testing complete? Regulators increasingly recognize that low findings may indicate limited coverage, shallow testing, or undocumented exclusions rather than strong security.

Traditional penetration test reports are negative-evidence artifacts. Only failed tests are documented. Successful tests disappear. This creates ambiguity. A report with many findings may indicate thorough testing or poor security. A report with few findings may indicate strong security or weak testing. Without test-case-level evidence, these outcomes are indistinguishable.

Regulatory feedback increasingly reflects this concern. Manufacturers are being asked to justify the absence of findings on specific interfaces or components. When no vulnerabilities are reported, reviewers expect evidence explaining how those areas were evaluated.

Recent regulatory guidance emphasizes the need for evidence of effective risk control, adequacy of cybersecurity controls under realistic conditions, and testing beyond standard software verification and validation activities. For example, the FDA’s 2025 premarket cybersecurity guidance states:

Manufacturers should provide details and evidence of testing that demonstrates effective risk control.

Manufacturers should ensure the adequacy of each cybersecurity risk control (e.g., security effectiveness in enforcing the specified security policy, performance for maximum traffic conditions, stability, and reliability, as appropriate)

Cybersecurity controls require testing beyond standard software verification and validation activities to demonstrate the effectiveness of the controls in a proper security context.

The assumption that a penetration test alone satisfies these expectations no longer holds.

To demonstrate diligence, manufacturers are increasingly expected to provide a complete inventory of cybersecurity test cases, traceability to architecture and security controls, evidence of execution with pass or fail criteria, and a full history of vulnerabilities identified and addressed across development. Penetration testing becomes a final, focused activity used to evaluate exploitability and interaction effects, not the primary mechanism for coverage. This is not V&V, this is cybersecurity testing, testing the efficacy of your security controls.

Point-in-time reports are no longer sufficient. Regulators now expect a living view of cybersecurity posture tied to a specific product release, with the ability to reconstruct how conclusions were reached over time.

The Modern Purpose of Penetration Testing for Medical

Human-driven penetration testing remains essential, but its role has changed. The highest value of expert human testers lies in evaluating complex interactions, logic flaws, configuration weaknesses, and exploit chains that emerge only when multiple conditions are considered together.

Medical device vulnerability testing involves multiple sources of vulnerability data, including SBOM analysis, static analysis, dynamic testing, and configuration assessment. These sources generate large volumes of potential issues that require contextualization. Penetration testing is best applied to verify exploitability, prioritize risk, and assess real-world impact across this broader dataset. We recommend penetration testing at the end of the testing lifecycle, but not as the primary or sole deliverable. Instead, penetration testing is most effective when applied in small, targeted increments, micro-penetration testing, focused on verifying exploitability for specific test cases and conditions.

This does not invalidate vulnerabilities that are not exploited during penetration testing. Instead, it recognizes that exploitability is one dimension of risk prioritization, while completeness and traceability remain regulatory requirements.

The Economics of Modern Cybersecurity Testing

Manually executing, tracking, and reporting this level of evidence is economically infeasible. Without automation and structured data, the cost of comprehensive cybersecurity testing would increase by an order of magnitude.

Scalability requires an approach that models product architecture, derives test cases from that model, associates execution evidence with specific components and controls, and tracks vulnerabilities throughout their lifecycle. Rather than producing isolated, disposable reports, cybersecurity testing must function as a system of record capable of generating defensible evidence at any point in time.

This approach transforms cybersecurity testing from an episodic service into a continuous engineering discipline. Continuous in both the duration of engagement (frequency of testing) as well as the continued aggregation of results and updates.

Conclusion

Legacy penetration testing was never designed to serve as a comprehensive measure of product cybersecurity posture, especially a product as critical as a medical device. Its objectives, economics, and outputs are misaligned with modern regulatory expectations of providing overall resilience and vulnerability posture.

Medical devices require test-case-driven cybersecurity testing that demonstrates completeness, traceability, and defensibility across the full product lifecycle. Penetration testing remains essential but must be applied as intended: as an expert, human-driven assessment of exploitability and interaction effects within a broader testing framework. Manufacturers must recognize that exploitability is one dimension of risk prioritization, while completeness and traceability of cybersecurity testing remain regulatory requirements. Executing micro-penetration tests where necessary to validate assumptions or verify exploitability is the future.

Achieving a coordinated cybersecurity testing approach where each key element of a medical device architecture is explicitly mapped to multiple test cases, augmented with vulnerability data from automated sources such as SAST, DAST, and SBOM CVE analysis, and followed by targeted penetration testing to validate exploitability and prioritize risk is impractical to sustain manually, though technically possible. Scaling this approach across multiple product lines and releases and throughout postmarket lifecycles requires automation and purpose-built platforms designed to manage this complexity. This is no longer a theoretical future state. It is the direction regulatory expectations are already moving, and it defines what credible cybersecurity assurance for medical devices will require going forward.

How Can We Help?

We created ELTON out of necessity. At ELTON, we offer annual and subscription-based engagement models to reduce the overhead, cost, and operational complexity of continuous cybersecurity testing. These models are supported by a platform designed to ensure testing traceability, completeness, and defensibility in response to increasingly complex reporting requirements during premarket review and across year-over-year postmarket activities. We believe this approach aligns more closely with current regulatory expectations and now represents our primary method for delivering cybersecurity testing in combination with the ELTON platform.

“We Did a Pentest” No Longer Satisfies Medical Device Regulators

Summary

Introduction

Overview

Origins of Penetration Testing

From Networks to Applications to Cloud to Products

Variability and the Limits of Human-Driven Testing

Why Medical Device Breaks the Legacy Penetration Testing Model

Regulatory Shift From Findings to Completeness

The Modern Purpose of Penetration Testing for Medical

The Economics of Modern Cybersecurity Testing

Conclusion

How Can We Help?

Get the Latest Security Insights

FDA Cybersecurity Premarket Documentation: What eSTAR Requires in 2026

Meeting FDA Cybersecurity Guidance for Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submissions

Understanding Is Predictability: A Better Model for Vulnerability Management

QMSR: FDA Replaced 21 CFR Part 820 with ISO 13485:2016 – What does this mean for cybersecurity?

Intelligence is Compliance