How HackerOne Uses AI to Triage, Prioritize, and Validate Code Scanner Findings at Scale
AI coding tools are enabling developers everywhere to perform at 10x (100x?) their normal capabilities as normal old human beings. While exciting and fantastic, this increase in productivity also comes with a notable increase in security risk. Application Security Testing (AST) tools are the reigning solution for detecting security issues in code at scale, but engineering and security teams struggle to keep up with the volume of scanner findings now more than ever. With more code comes more findings, more noise, and more frustration.
We know that automated tools are great at discovering potential issues in code, but developers are more often presented with noise than valid and actionable feedback. Humans are great at validation, but they’re not fast enough to do it alone, especially when we are dealing with unprecedented volume. So how do we tackle the volume of issues and scale the development team’s ability to find out what’s useful or not?
Bridging the Gap Between Discovery and Validation
HackerOne Code inserts AI immediately after the discovery of potential vulnerabilities to evaluate the validity of these findings. The “simple” question we ask AI during this intermediary step is “Are these findings valid and do they have potential security impact?” The most crucial part in answering this question is gaining an understanding of the target code through the available context.
Once the detection process is completed, we use the clues provided within the code to evaluate the validity of all automated findings triggered and make an assessment that can then be used as context for final validation and remediation to be completed.
Powered by Anthropic foundation models, we utilize the following types of data as context:
- All automated findings detected through HackerOne Code’s AI security engine and other automated security tools.
- Unique and user-defined memories logged by development teams. With memories in the mix, we can adapt based on direct feedback from code authors.
- Code navigation tools give us the ability to go deeper than the surrounding code diff, providing additional insights into the structure and contents of the repository. Additional context from the repository leads the way to more complex reasoning capabilities used to help in the validation process and make informed decisions on the issues raised.
What Data is Supplied to End Users for Final Validation
All context provided is searched and used to make a determination on what context, if any, is relevant to the issues raised by automated security tools. Relevant data is then provided in an accessible format. High confidence issues present a real security risk with tangible impact, are likely exploitable, or could cause operational problems.
Once a risk is determined to be valid, the output of our work is provided for consideration in the final validation and remediation.The following details are provided:
- Severity Assignment: A basic severity ranking of Low, Medium, or High is provided to help developers get a sense of the real world implications, as well as the priority of the finding weighed against others raised. Severity is based on the impact of the vulnerability if it were to be successfully exploited. The exploitability of the vulnerability is another factor in determining a severity. Findings that require multiple steps to exploit, have limited impact, or affect non-critical components are assigned lower severities.
- Common Weakness Enumeration (CWE) Assignment: Commonly defined weaknesses are selected and provided based on the issue described to provide added real world impact and context at a higher level. While CWEs are not always interesting to code authors, they do tend to be very helpful for the security teams collaborating with development.
- Remediation Advice: Findings noted as likely valid come with concise and actionable remediation advice. For higher severity issues, more comprehensive advice is supplied. Partial implementation solutions are also acknowledged when full remediation would be a much more complex undertaking. Guidance is also provided on risks that might come with making changes in more sensitive and complex areas of code.
- Validity Assessment: In addition to the clear decision on whether an issue is likely valid, a short explanation is provided to help code authors in understanding the logic and other contextual factors that went into the decision to mark the issue valid.
Why This Matters to Development Teams
Most developers on a team are not expected to understand all of the idiosyncrasies of a codebase. Although familiarity with the code is a definite advantage, manual validation of automated security findings is still generally considered to be an onerous and fraught process. With AI in charge of initial triage, prioritization, and validation, we can perform the job in a matter of minutes, helping to connect the dots for engineers more quickly and making necessary security remediations more accessible before vulnerabilities make it to production.