An Introduction to Buffer Overflow Vulnerability

The art of memory exploitation

Ashwin Goel

Published in

Better Programming

7 min readDec 2, 2019

Buffer

A buffer is a temporary storage, usually present in the physical memory used to hold data.

Consider the program shown in the left image where a character buffer of length 5 is defined. In a big cluster of memory, a small memory of 5 bytes would be assigned to the buffer which looks like the image on the right.

Buffer Overflow

A buffer overflow occurs when more data is written to a specific length of memory in such a way that adjacent memory addresses are overwritten.

Demo (controlling local variables)

Let’s take an example of a basic authentication app that asks for a password and returns Authenticated! if the password is correct.

Without really knowing how the app works, let’s enter a random password.

It says Authentication Declined! since the password wasn’t correct. To test, we need to enter large random data.

You must be wondering why it got authenticated and why there is a Segmentation fault. Let’s see a more detailed version of the app.

As you can see, there are three variables: auth, sys_pass, and usr_pass.

The auth variable determines if the user is authenticated or not depending on the value (initially, 0). The usr_pass stores the password that the user enters and the sys_pass variable is what the correct password is.

How the app works is, if the usr_pass variable is equal to sys_pass, then the auth variable becomes 1. If the auth variable is not 0, then the user is authenticated.

You may also see how the variables are stored in memory. Since the address is in hexadecimal and there is a difference of 1, therefore, usr_pass and sys_pass variables are buffers of length 16.

To test for buffer overflow, a long password is entered as shown.

As you can see, the password entered in usr_pass variable overflows the sys_pass variable and then the auth variable.

Note: C functions like strcpy(), strcmp(), strcat() do not check the length of the variable and can overwrite later memory addresses, which is precisely what buffer overflow is.

Refer to the code below for better understanding.

#include <stdio.h>int main(void) {    
    int auth = 0;
    char sys_pass[16] = "Secret";
    char usr_pass[16];    printf("Enter password: ");
    scanf("%s", usr_pass);    if (strcmp(sys_pass, usr_pass) == 0) {
        authorized = 1;
    }    printf("usr_pass: %s\n", usr_pass);
    printf("sys_pass: %s\n", sys_pass);
    printf("auth: %d\n", authorized);
    printf("sys_pass   addr: %p\n", (void *)sys_pass);
    printf("auth       addr: %p\n", (void *)&authorized);    if (auth) {
        printf("Authenticated!\n");
    }
    else{
        printf("Authentication declined!\n");
        }
}

Note: This might be an unrealistic example and is only meant for understanding purposes. You may not see such situations in real life.

Let’s dive a little deeper into the concepts now.

Division of Memory for a Running Process

Source: Techno Trick.

This is what the memory assigned to a process looks like. There are various sections like stack, heap, Uninitialized data, etc. used for different purposes.

You may read more about the memory layout here: Memory layout of a process.

This blog focuses on buffer overflow in a stack so let’s look at that.

Stack: A LIFO data structure extensively used by computers in memory management, etc.
There is a bunch of registers present in the memory, but we will only concern ourselves with EIP, EBP, and ESP.
EBP: It’s a stack pointer that points to the base of the stack.
ESP: It’s a stack pointer that points to the top of the stack.

5. EIP: It contains the address of the next instruction to be executed.

Stack Layout

The above image shows what a stack looks like. It might look intimidating, but trust me, it isn’t.

Let’s see some important points related to the stack:

A stack is filled from higher memory to lower memory.
In a stack, all the variables are accessed relative to the EBP.
In a program, every function has its own stack.
Everything is referenced from the EBP register.

Source: IT & Security Stuff.

Above the EBP, function parameters are stored.

For example:

void foo(int a, int b, int c){
      //Function body
   }

Here, a, b, and c are the function parameters stored above the EBP.

All the local variables of a function are stored below the EBP.
The Old %ebp is the value of the EBP of the previous function. Since, after a function is executed, it has to return back to an older function, we need to store the values of both old EBP and EIP.
ESP register stores the address of the bottom of the stack.

For example:

void foo(int a, int b, int c){
       int x;
       int y;
       int z;
   }

Here, x, y, z are local variables to the function and are stored below the EBP.

Exploiting Buffer Overflow

It’s time to get into buffer overflow exploitation using the stack.

Before that, let’s try to understand how a stack is built for any function.

Let’s look at an example, below:

The stack on the right is of the function foo as seen in the left image.

Since a, b, and c are parameters passed to the function, they are stored above the EBP. Also, because the stack is filled from higher to lower memory and parameters are read from right to left, c is written first in the memory, followed by b and a.
x, y, and z are the local variables stored below the EBP.
It is also required to store the Old EIP and Old EBP of the function main in the stack to know where to return to after the function executes.

Now, as shown in the previous demo, you could see how buffer overflow took place, using the local variables.

Source: Security Sift.

Imagine a situation where you overflow the variables x, y, and z in such a way that the old EIP is modified and stores the address of the memory where the malicious code is placed.

Refer to the below image for better understanding.

Assume a buffer with a length of 500 defined in a function. Now it is overflowed in such a way that it has some random data, followed by the shellcode (malicious code) and then the return address which points to the shellcode.

So, after the function gets executed, the instruction pointed to by the Return address gets executed and this is how our shellcode gets executed.

This is pretty much how buffer overflow happens.

You must watch this video: Buffer Overflow Attack — Computerphile to get a more realistic idea of buffer overflow. The codes used in the above video are on GitHub.

Security Measures

Use programming languages like Python, Java, or Ruby in which dynamic memory allocation takes place and the language itself manages the memory for you.
In languages like C and C++, before writing data to a buffer, perform all the relevant checks and input validation.
Before using any external libraries, check for security vulnerabilities in it.
Use source code analysis tools for static analysis against vulnerabilities.
Use a non-executable stack: This means that even if a machine code is injected into the stack, it cannot be executed as that particular region of memory is non-executable. It is done by setting up NX bit.

Note: Even after these measures are taken, it might be possible to exploit buffer overflow. Therefore, these are just layers of security that can help to prevent the exploitation of buffer overflow.