Lesson Notes
Buffer Overflows Intro
Module 4: Common Vulnerabilities. Basics.
Module 4: Buffer Overflows — Comprehensive Theory Guide
A buffer overflow occurs when a program writes more data into a buffer (a contiguous block of memory, e.g. a fixed-size array or a stack-allocated region) than the buffer was designed to hold. The extra bytes overwrite adjacent memory. Depending on layout, that adjacent memory may hold other local variables, saved frame pointers, or the return address—the address to which the function will jump when it returns. If an attacker can control the overflow (e.g. via input that is copied into the buffer without length checks), they can overwrite the return address with a pointer to their own code (shellcode) or to existing code (e.g. ROP gadgets). When the function returns, the CPU transfers control to the attacker's chosen location, leading to code execution. This lesson explains the mechanics in detail and how modern defenses (ASLR, DEP, safe APIs) mitigate the risk. A harmless stack-smash demo (e.g. a simple C program that overflows and crashes) illustrates the concept without full exploitation.
Stack Layout and the Return Address
When a function is called, the compiler typically allocates space on the stack for local variables and saves the return address (where to jump when the function exits). In a simple layout, a buffer might sit just below the saved return address. If the code copies user input into the buffer using an unsafe function (e.g. strcpy, gets, sprintf without length limits), and the input is longer than the buffer, the extra bytes overwrite the return address. When the function executes its return instruction, it pops that overwritten value into the instruction pointer—so execution jumps to whatever address the attacker placed there. If that address points to attacker-supplied shellcode (injected in the same buffer or elsewhere), the attacker's code runs with the process's privileges. Crashes (segmentation fault, access violation) often occur when the overwritten address is invalid or when the overwrite corrupts other critical data; they are a sign of memory corruption.
Why Buffer Overflows Lead to Code Execution
The attacker's goal is to control the instruction pointer. By overflowing the buffer, they control the contents of the return address slot. They can point it to: (1) shellcode they placed in the buffer (if they know or guess where the buffer is in memory); (2) shellcode in an environment variable or other writable region; or (3) existing code in the process or libraries (Return-Oriented Programming, ROP). Defenses make each of these harder: DEP/NX prevents executing data regions (so shellcode in the buffer or heap does not run); ASLR randomizes addresses so the attacker cannot reliably know where to jump. Even with these, information leaks or partial overwrites can sometimes be chained into full exploitation—so preventing the overflow in the first place is essential.
Defenses: ASLR, DEP/NX, and Safe Coding
ASLR (Address Space Layout Randomization): the OS loads the executable and libraries at random addresses at runtime. The attacker cannot assume the address of their shellcode or of useful gadgets; they would need an information leak to defeat ASLR. DEP (Data Execution Prevention), also called NX (No-eXecute) or W^X (write XOR execute): memory pages are marked either writable or executable, not both. The stack and heap are typically writable but not executable, so code placed there cannot be run. The CPU will fault if the instruction pointer jumps to a non-executable page. Safe functions and languages: use bounds-checked APIs (strncpy with correct length, strlcpy, or C++ std::string) and avoid gets, sprintf without length, or unchecked array indexing. Modern safe languages (Rust, managed runtimes) enforce memory safety and prevent buffer overflows at the language level. Use these in development and keep ASLR and DEP enabled in production.
Key Takeaway for Lesson 16
Buffer overflows are memory corruption bugs: writing past a buffer overwrites adjacent memory, including the return address, and can lead to code execution. Defenses: ASLR (random addresses), DEP/NX (no execution from data), and safe coding (bounds checking, safe languages). Understanding them motivates secure development and runtime hardening. Next: firewalls and UFW/iptables.