Identifying Code Vulnerabilities With Azure OpenAI Service

A look at davinci003 model’s ability to detect security incidents

Jorge G

Published in

Better Programming

5 min readFeb 15, 2023

This article will analyze how to use Azure OpenAI Service for code vulnerability assessments and its effectiveness.

Code vulnerability assessment evaluates security weaknesses in software code to prevent exploitation by attackers. It is important as it improves overall system security and reduces the risk of security incidents. Regular assessments ensure compliance with industry standards and are crucial to a comprehensive software security strategy.

Azure OpenAI is a cloud-based platform that provides access to OpenAI’s state-of-the-art artificial intelligence models. These models can be used for various applications, including code and text generation. The difference between code and text generation models is in the type of output they generate. Different types of models are included in the service, but for this case scenario, we will be using the davinci003 model, which is the most capable of the text generation models.

Here you will find a list of some of the most common types of code vulnerabilities:

Buffer overflows: This occurs when a program writes more data to a buffer than it can hold, causing data to overflow into adjacent memory locations and potentially compromising the system.
SQL injection: This occurs when an attacker injects malicious SQL code into an application’s SQL statement, allowing them to access or manipulate sensitive data in a database.
Cross-Site Scripting (XSS): This occurs when an attacker injects malicious code into a web page viewed by other users, allowing them to steal sensitive information or launch attacks against the users.
Remote code execution: This occurs when an attacker can execute code on a remote system, often by exploiting a vulnerability in a web application.
Directory traversal: This occurs when an attacker can access restricted directories or files by using dot-dot-slash (../) sequences in a file path.
Improper input validation: This occurs when an application does not properly validate user-supplied input, allowing an attacker to inject malicious data into the application.
Cross-Site Request Forgery (CSRF): This occurs when an attacker can trick a user into executing actions on a web application that they did not intend to perform.
Broken authentication and session management: This occurs when an application’s authentication and session management mechanisms are insecure, allowing an attacker to gain unauthorized access to sensitive data or resources.

It’s important for organizations to be aware of them and take steps to prevent and mitigate them. Conducting regular code vulnerability assessments and implementing secure coding practices can help to reduce the risk of these vulnerabilities in software systems.

Conventional methods of code vulnerability assessment typically involve manual code review, static analysis, and dynamic analysis. Manual code review involves a human analyst reviewing the source code of a software application to identify potential vulnerabilities, which can be time-consuming and labor-intensive.

Static analysis, on the other hand, involves analyzing the source code of an application without executing it and can be automated, making it faster and more efficient than manual code review, but it may not be able to detect all potential vulnerabilities, particularly those that only become apparent when the code is executed. Dynamic analysis involves executing the code of an application and monitoring its behavior to identify potential vulnerabilities and can be more effective than static analysis, but it may also be more time-consuming and resource-intensive and may not be possible to test all possible execution scenarios thoroughly.

The limitations of conventional methods of code vulnerability assessment include their time and resource intensity, potential for missed vulnerabilities, and dependence on human expertise. Additionally, these methods may need help to keep pace with the increasing complexity and size of modern software systems, making it difficult to assess the security of these systems thoroughly.

To address these limitations, organizations increasingly turn to AI-powered code vulnerability assessment tools that use machine learning algorithms to automate and enhance the code review process. These tools can provide more comprehensive and efficient code vulnerability assessments, helping organizations to secure their software systems.

Vulnerability Analysis With davinci003

For this article, we will be working with the Azure OpenAI Studio, but we could imagine integrating the Azure OpenAI API with your IDE or inside your DevOps pipelines.

For this first example, we have taken a piece of C code in which we can see there is code that contains a potential vulnerability of a buffer overflow attack.

https://github.com/snoopysecurity/Vulnerable-Code-Snippets/blob/master/Buffer%20Overflow/bof1.c

SQL injection

Vulnerable-Code-Snippets/example.java at master · snoopysecurity/Vulnerable-Code-Snippets · GitHub

Code injection and XSS vulnerabilities

Vulnerable-Code-Snippets/eval2.php at master · snoopysecurity/Vulnerable-Code-Snippets · GitHub

Remote code execution

Directory traversal

Vulnerable-Code-Snippets/gq.js at master · snoopysecurity/Vulnerable-Code-Snippets · GitHub

Improper input validation

https://cwe.mitre.org/data/definitions/20.html#:~:text=/*%20board%20dimensions%20*/,n%20*%20sizeof(board_square_t))%3B

Cross-Site Request Forgery (CSRF)

https://learn.snyk.io/lessons/csrf-attack/javascript/

Broken authentication and session management

Vulnerable-Code-Snippets/CVE-2019–1937 at master · snoopysecurity/Vulnerable-Code-Snippets · GitHub

As we have seen through the different code examples, davinci003 has shown a nice potential for code vulnerability scanning. The different types of vulnerabilities we have tested have been found across the different snippets of code using different programming languages. It would be good to test it with more complex code samples and recent vulnerabilities to see its full potential further.

Even if using OpenAI for code vulnerability scanning looks promising, it still presents potential challenges you must be aware of to leverage this technology. One such challenge is data bias, where the quality and quantity of training data can impact the accuracy of machine learning models. Even if the Davinci003 model understands code, it has been trained for generic text generation and not specifically for code scanning and could create false positives and/or negatives, which can impact the accuracy of code vulnerability assessments. Organizations could perform additional fine-tuning with code vulnerability training sets to mitigate some of these challenges.

Please note this article does not encourage replacing your current vulnerability scanner with OpenAI service but rather shows the potential that LLM such as OpenAI could have in the world of vulnerability scanning and how it could improve the security of your code base when integrating OpenAI service with your IDE.

Better Programming

Identifying Code Vulnerabilities With Azure OpenAI Service

A look at davinci003 model’s ability to detect security incidents

Vulnerability Analysis With davinci003

SQL injection

Code injection and XSS vulnerabilities

Remote code execution

Directory traversal

Improper input validation

Cross-Site Request Forgery (CSRF)

Broken authentication and session management

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Better Programming

Written by Jorge G

No responses yet