Cookie Crab © 2023 James Leonardo. All Rights Reserved. Generated with DALL-E and Night Cafe
Cookie Crab © 2023 James Leonardo. All Rights Reserved. Generated with DALL-E and Night Cafe

I asked this question on LinkedIn: “What kind of medium severity bug would you rather have in your system?” The options that I gave were 1)a feature bug and 2)a security bug.

While I only ran the poll for a week and didn’t heavily promote it to get more views on it, the handful respondents were unanimous: they would rather have a feature bug. I am not at all surprised by that result. This article will explore that a little bit and dive into a common cause of security bugs.

I didn’t ask “why did you choose your answer”, but I can guess at what most were thinking. It will go something like “The impact of a feature bug is more likely to be limited. It may only be an annoyance or something where we can tell users what to do if they see it. The impact of a security bug? You just don’t know how large it will be. Security bugs will damage our reputation and we don’t know what else an attacker could do if they find that one bug.”

Data backs up this feeling. IBM claims the average cost of dealing with a cyberattack in the US is 9.44 million USD and it can take months to clean up the damage. Much digital ink is spilled on dealing with attacks from an IT infrastructure and operations perspective, but I feel we generally take it for granted in the software development world. It’s not wholly unjustified: most breaches still seem to start with social engineering and credential stealing. However, from SQL Slammer(2003) to Log4Shell(2021), we have years’ of evidence that when there is a security bug in software, it can become a mess very quickly.

SQL Slammer is an instance of one of the most common types of security bugs: memory errors. In 2020, Google’s Chromium team reported that seventy percent of its serious security bugs were caused by memory safety problems. Microsoft engineers have annecdotally echoed the same findings. Seventy percent of all severe bugs falling into the same class of issues surely means we should have a standard way to stop these from happening, right?

Fortunately, there is.

If you have a team building software for your business and that software was written in the last 20 years, there’s a very good chance you are largely protected. That’s because you’re probably using a language like C# or Java that isolates you from most cases where these bugs come up. These are somewhat high-level programming languages that limit the amount that the developer needs worry about how the computer is using its memory. They are both what we refer to as garbage collected languages: the developer can put things in computer memory, the computer figures out when those things aren’t being used anymore and cleans up after itself. If the memory wasn’t cleaned up, then eventually we’d run out of memory and the program, or computer, would crash. These higher level languages also build in other memory safety features such as checking you are not trying to access memory that doesn’t exist.

The downside is that these checks are extra work for the computer and limit flexibility. While C# and Java can run pretty fast, they are not really fast, and the garbage collector means there’s a small slow down every few minutes. You probably will only notice that slow down when the system is performing computationally intensive tasks, especially for an extended period. If you’re playing a game for a while and noticed a slow down every few minutes, that could be because it was written in a garbage collected language.

This general theme of performance is one of the main reasons why teams like the Chromium team (Chromium is the core framework for browsers such as Chrome and Edge) still chose to use lower level programming languages like C++ and C that do not build in memory safety. They leave it to the developer to write good code and remember to use tools to check for issues.

How can the lack of memory safety be exploited? Computer memory is like our own memory: it’s where the computer puts information so it can get it later. Think of computer memory as a table composed of two columns: one column is just the row number (referred to as an address) and the other is the data associated with that row. The amount of data in location is small: a single character or numeric value. Let’s store the phrase “No Hello” in memory, starting at address 12.

address data
12 ‘N’
13 ‘o’
14 ’ ‘
15 ‘H’
16 ‘e’
17 ‘l’
18 ‘l’
19 ‘o’
20 \0

In programming, we generally refer to a phrase like this as a string, short for “a string of characters.” At location 20, we have marked the end of the string with what is known as a null terminator (notice I didn’t put it in single quotes). Anyone reading that string should interpret that to mean “this is the end of the string, stop here.”

I’m going to use a hypothetical programming language “Jim’s Expressive Runtime Kernel” (JERK) for some examples. JERK can run in memory safe or unsafe mode. In order to work with our phrase, we’ll assign it to a variable:

let my_phrase = "No Hello"

Once I’ve assigned it, I can use my_phrase to refer to that string in the future. So, let’s show that to the user by “printing” it to the screen (we still refer to this process as “print” even though very few of us remember the time when the primary output for the computer was a printer instead of a screen):

print(my_phrase)

Like most modern programming lanugages, JERK has the notion of an array. An array is a consecutive group of memory locations that is used for storing lists of things like a string of characters. It gives us a handy starting point because in JERK we don’t know that the computer decided to put the string at address 12, it could have put it somewhere else.

Let’s convert my_phrase to an array and then print the fourth character:

let my_array = my_phrase.as_array
print(my_array[3])

In JERK, [3] means “add 3 to the starting address of the array and give me the data in that location.” We usually refer to this as the array index, but it is better to refer to it as the offset because the start of the array is [0], not [1]. That means [3] is the fourth character, ‘H’. It sounds confusing to start at 0, but it actually makes most programming tasks easier.

actual address data array offset
12 ‘N’ 0
13 ‘o’ 1
14 ’ ‘ 2
15 ‘H’ 3
16 ‘e’ 4
17 ‘l’ 5
18 ‘l’ 6
19 ‘o 7
20 \0 8

With the array, we can start to see differences between memory safe and not safe. When JERK tracks the array in memory safe mode, it also tracks the end of the array and checks that we are only trying to access memory within that array. In memory safe mode, trying to access that “end of string” character with print(my_array[8]) will throw an error, because the conversion to array knows it really isn’t part of the string. Errors will also be thrown when we try to use any number greater than 7 or less than 0 for the offset. It’s simply not allowed in memory safe mode.

In memory unsafe mode, we assume the developer knows what they’re doing. my_array is a convenient pointer to memory address 12 and nothing more. Now print(my_array[8]) will print the null terminator string instead of erroring.

Let’s say I want to print all the characters in the array. I could do something like this print each character of the string separately:

let array_position = 0
loop {
    let current_character = my_array[array_position]    
    print(current_character)
    array_position = array_position + 1
    if current_character == \0
        exit_loop
}

Oops.

I goofed.

My intent is to start at the beginning of the string (position 0), print the current character, then move to the next character. The value of array_position tells me where I am at. Because \0 is only the way we say “here is the end of the string”, I shouldn’t print it to the screen. However, this code prints the current character and then checks to see if it is \0. So, our output looks like this:

N
o

H
e
l
l
o
\0

instead of

N
o

H
e
l
l
o

In reality, the \0 would be even harder to detect because it is a special character that may not even have a visual representation on your computer, not even empty space. Its size on the screen will be 0 pixels by 0 pixels. As you can probably imagine, that gives rise to all kinds of subtle bugs because when you print it, you are printing nothing and not even moving the cursor position.

My code (in this fictional language) really should look more like this:

let array_position = 0
loop {
    let current_character = my_array[array_position]    
    if current_character == \0
        exit_loop
    print(current_character)
    array_position = array_position + 1
} 

“Off by one” errors of this nature are fairly common and a common source of boneheadedness (Rule 9). They’re bad enough when they mess up your logic, but when they give access to the rest of computer memory, they can open up all manner of security problems. What happens if I create a way to get only a portion of a string? JERK gives us a handy way to reuse code by declaring that code to be a method. A method can have parameters that represent the data it should work on. In this case, string is the string we want to get characters from, start is offset position to start from and number_of_characters is the number of characters that should be returned counting from that position.

meth substring(string, start, number_of_characters) {
    let character_array = string.as_array
    let array_position = start
    let output_string = ""

    loop {
        let current_character = character_array[array_position]        
        output_string.add(current_character)
        array_postion = array_position + 1
    }
    return output_string
}

There’s many problems in that code because it does nothing to validate that start and number_of_characters actually fit inside the string you pass in. If you passed in "Ok", 12, and 34, what would happen?

In memory safe JERK, it would throw errors at you because you’re trying to operate outside the boundaries of the array. In unsafe JERK, it would happily return data for the next 34 memory addresses starting 12 spots past the beginning of character_array. Remember that in unsafe mode, JERK tracks the array as the address of the first memory position and relies on the programmer to know when to stop. We could even pass in a negative start value, thereby giving access to any memory that the program has access to.

Can you see the security problems? If I can access all memory, I can copy any data I want out of the application. Since code is also stored in memory, I can inject code of my choosing into memory. Now that I can fool your computer into running my code, what evil things can I get up to? Lots.

In these unsafe languages, it’s up to the developer to do all the checking. The means of doing the checks is often simple and straight forward. You don’t even have to write them yourself in most cases. It is just too easy to miss these extra steps in the crunch of trying to get an application completed and shipped. We’re rarely sitting down to write an algorithm to return a couple of characters from a larger string. It’s usually several layers of algorithms and methods and multiple programmers all assuming the other person did the right thing. Avoiding these simple-to-make, but hard-to-find, bugs is why higher level languages are so popular.

Built-in memory safety isn’t only a feature for higher level languages anymore; Rust provides memory safety without garbage collection by implementing a system of memory ownership that is enforced by the compiler (the code that turns your code into code the computer understands). Rust can compete with C and C++ in terms of performance so more and more teams are turning to it when a garbage collected language is not suitable. I like Rust, but higher level lanugages are still easier for most people to understand. The top four higher level languages (Python, JavaScript, Java, C#) also have massive developer communities and a huge number of tools to help get the job done, so I believe we’ll continue to see them used for most apps that can tolerate the performance cost.

I’ve simplified the mechanisms of memory safety, but I hope this has helped you understand more about how memory unsafe languages can give rise to security bugs if you’re not careful. With the number of cybercriminals out there, choosing development tools that help us be safe and secure by default helps keep our systems safe. Next time, I’ll dive into a bit of my own experience with Rust and lessons that I’ve learned in the process.