=====================
== Rase's basement ==
=====================
Simple and functional is beautiful

Reversing - Automating Static Analysis With the Binary Ninja Api

Automating static analysis with the Binary Ninja API

Explanation

Static binary analysis can be a daunting task. It is time-consuming and hard. You should automate most of this hard work if possible.

The Binary Ninja API from my experience is the best tool for the job. This is because the whole binary analysis tool was developed with the API in mind.

Binary Ninja provides an API in Python, Rust, and C++. I believe Binary Ninja itself is built with the native C++ API. This means that in theory, everything you can do in the UI should be possible to do in some sort of API provided by Binary Ninja.

In this blog post, I will show you how I automated a simple task that would have taken me a long time to do manually. I will do it in Python.

The automation task

The binary I’m analyzing has a function that seems to initialize some kind of “table” of global variables.

During startup, the binary has a bunch of uninitialized global variables. Shortly after, all of them get assigned constant strings that reside in the .data section of the binary, in other words, the global variables are assigned with strings that are known at compile-time.

I’ve decided to name this function initialize_string_table. It takes three arguments. The decompilation of the function looks like this:

void* initialize_string_table(int64_t* dst_string, int128_t* src_string, int64_t arg3)
{
		void* rbx_1 = (arg3 - src_string);
		sub_7ff61cccc990(dst_string, ((char*)rbx_1 + 1));
		int128_t* rdi = *(int64_t*)dst_string;
		memcpy(rdi, src_string, rbx_1);
		void* rax = (((char*)rdi - src_string) + arg3);
		dst_string[1] = rax;
		*(int8_t*)rax = 0;
		return rax;
}

So basically it just copies the contents of the second argument into the memory address provided by the first argument. The stuff done with arg3 is irrelevant to this blog post.

The problem

Now, there are around 1300 callers to this function and the contents of the first argument are used in the binary in random places, finding out what it contains would take some time because the memory location is assigned to at run-time by this function and you would have to trace back into this function call to see what the string assigned was.

The call pre-script to this function looked something like this:

Untitled

Now, the first argument, &data_xxxxx is referenced from other parts of the binary, for example:

Untitled

In the context of initialize_string_table(&data_xxxx, string, &data_xxxx) the result of the operation is more obvious but what about this:

Untitled

Not so obvious, right? What the hell is &data_xxxx anyway?

The solution

The solution is to rename the &data_xxxxxxx variable with the string assigned to it in the initalize_string_table function. I’m going to use the Binary Ninja API for this.

I’m first going to show you the script I made, and then I’m going to go in-depth into the contents of the script.

from binaryninja import *

def replace_non_alphanumeric_characters(input_string):
    copy = input_string
    copy = ''.join(filter(str.isalnum, input_string))
    return copy

initialize_string_table_callers = bv.get_functions_by_name("initialize_string_table")[0].callers

for caller in initialize_string_table_callers:
    for h in caller.medium_level_il.instructions:
        if isinstance(h, MediumLevelILCall):
            if hex(h.dest.value.value) == "0x7ff71634c9e0":
                arg1 = bv.get_data_var_at(h.params[0].value.value)
                arg2 = bv.get_string_at(h.params[1].value.value)
                if h.params[0].operation == MediumLevelILOperation.MLIL_CONST_PTR and h.params[1].operation == MediumLevelILOperation.MLIL_CONST_PTR:
                    if arg1 is not None and arg2 is not None:
                        arg2 = replace_non_alphanumeric_characters(arg2.value)
                        if arg2 is not "Error":
                            print("call to initialize_string_table")
                            print(f"arg1: {arg1}, arg2: {arg2}")
                            print(f"{str(h)} is a call!")
                            arg1.name = f"str_{arg2}"
# First we import the modules from the Binary Ninja API:
from binaryninja import * 
# Get all function calls to the initialize_string_table function. 
# Binary Ninja is smart and can find the specific function from the symbol list and return its callers.
initialize_string_table_callers = bv.get_functions_by_name("initialize_string_table")[0].callers
# Iterate all callers, for each caller get the [medium-level il](https://docs.binary.ninja/dev/bnil-mlil.html) instructions 
# of the function calling the initialize_string_table function.
for caller in initialize_string_table_callers: 
    for h in caller.medium_level_il.instructions:
# Check to see if the instruction was a function call, 
# if it was, check if the called functions address matched 
# the address of initialize_string_table (0x7ff71634c9e0).
if isinstance(h, MediumLevelILCall):
	if hex(h.dest.value.value) == "0x7ff71634c9e0":
# arg1 captures the data variable at the contents of the first argument passed 
# into the function. 
# arg2 captures the string at the memory address of the second argument.
arg1 = bv.get_data_var_at(h.params[0].value.value)
arg2 = bv.get_string_at(h.params[1].value.value)
#Checks, if the first and second arguments passed into the function, 
# are constant pointers.
if h.params[0].operation == MediumLevelILOperation.MLIL_CONST_PTR 
	and h.params[1].operation == MediumLevelILOperation.MLIL_CONST_PTR:
# Checks to see if the memory addresses are initialized with *something. 
#* Most likely useless but I just wanted to make sure.
if arg1 is not None and arg2 is not None:
# Replace the weird characters in the second arguments string so that 
# it can be assigned to the first arguments name.
arg2 = replace_non_alphanumeric_characters(arg2.value)
# Check to see if arg2 is not an error and if it ain’t, log some stuff 
# and initialize the name of the first argument's name into the string 
# contents of the second argument. I added a str_ prefix for my future self 
# to know it was just a string.
if arg2 is not "Error":
	print("call to initialize_string_table")
	print(f"arg1: {arg1}, arg2: {arg2}")
	print(f"{str(h)} is a call!")
	[arg1.name](http://arg1.name/) = f"str_{arg2}"

The result is a nice .bndb file that contains renamed variables in all caller functions that provide constant pointers into the function. This helps a lot in the analysis process because it gives context to the code using the magic &data_xxxx variables.

Untitled

So, for example here we can see a renamed variable used somewhere else in the binary.

Untitled

Thank you for reading this short hurried blog post.