Intel 8086 - Simulating Conditional Jump

20.06.2023 15:30

Intel 8086 - Simulating conditional jumps

Assignment

Casey Muratori told me to simulate conditional jumps and loops in this assignment.

They’re the same thing.

About this blog post

This blog post is a bit different since I’ve already completed this assignment. The reason for this is that I was being lazy while doing this, but now I decided I want to blog about this one too.

I spent tons of time debugging this, and I’m now going to provide the solution to this one but know that it took some debugging hours from me, and it wasn’t easy.

What is a conditional jump?

In assembly terms, it’s a conditional jump that checks if some flag is set in the FLAGS register, and if it is, it sets the IP into the second byte of the conditional jump instruction. For example:

jnz 0x9000.

What is a loop?

A loop is just if and go-to statements.

This gets abstracted away from the programmer because go-to statements are very bug-prone.

For example in C:

int counter = 10;
loop_start:
if (counter > 0) {
	—counter;
	goto loop_start;
	// Or something like this.
}

In assembly, it looks like this:

mov cx, 3
mov bx, 1000
loop_start:
add bx, 10
sub cx, 1
jnz loop_start

So we just subtract, check if the subtraction caused a flag in the FLAGS register to be set. In this case we’re checking for the Zero Flag (ZF). If the condition gets met, meaning, the zero flag was not set (0) we jump into the memory address of the loop_start label.

Loops are just conditional jumps.

Implementation

I’m now going to simulate the conditional jumps by adding the logic for a loop in assembly.

I already made some changes:

The update_register_value function now takes in a signed number instead of an unsigned number. This was done because subtractions were causing overflows.
Changed the loop counter name from i to instruction_pointer, because that’s what it is.

The assembly to simulate in this blog post:

mov cx, 3
mov bx, 1000
loop_start:
add bx, 10
sub cx, 1
jnz loop_start

The conditional jump has the address it conditionally jumps to in the second byte. I spent tons of time figuring out how I’m going to correctly use this second byte.

I thought about casting the current instruction_pointer into raw pointer and calculating the memory address offset from the loop_start conditional jump memory address. I would then divide this offset with the size of the element in the binary_contents array that contained the contents of the binary file.

The formula would of been like this:

if instruction_pointer_address > loop_start_address {
  // this branch means that we would be jumping backwards.
  let offset = instruction_pointer_address - loop_start_address;
  let index_offset = offset / sizeof::<u8>();
  instruction_pointer -= index_offset;
} else {
  // this branch means that we would be jumping forward.
  let offset = loop_start_address - instruction_pointer_address;
  let index_offset = offset / sizeof::<u8>();
  instruction_pointer += index_offset;
}

Then I realised that it could not work because instead of encoding the high level language into a lower level language like assembly, we’re going the other way around.

This means that the binary representation when converted back to assembly code has a different “address space” than our high level code. Meaning, the address we get from the second byte is meaningless in our high level simulating code.

This might seem like the most obvious thing ever and now that I explain it to you, the viewer, it is. For some reason my brain just wasn’t thinking about this at all.

After over-engineering this whole thing, I realised that the second byte of the conditional jump actually embedded the relative byte offset to jump to. I was so grateful for this. I had to do no calculation myself.

I started off by adding a function that sees if a flag register is set. I did this because the conditional jumps modify the instruction pointer if a corresponding flag in the FLAGS register is set.

pub fn flag_register_is_set(flag: &'static str, flag_registers: &[FlagRegister]) -> bool {
    for flag_register in flag_registers.iter() {
        if flag_register.register == flag {
            return flag_register.is_set;
        }
    }
    panic!("Flag {} not found", flag);
}

This function will be used by our conditional jump logic to see if the conditional jump fires or not.

At the end of the main loop I added a section that checks to see if the current instruction is a conditional jump.

if instruction_is_conditional_jump(instruction) {
  ...
}

Inside of this branch I added a mutable variable that is used as a flag to see if the combination of the conditional jump and the flag corresponding to the conditional jump cause the instruction pointer to be modified.

if instruction_is_conditional_jump(instruction) {
  let mut jump_happens = false;
  ...
}

Next I added the conditional jumps into a match statement.

match instruction {
  JE_JUMP | JLE_JUMP | JBE_JUMP => {
    ...
  },
  JS_JUMP => {
    ...
  },
  JNE_JUMP => {
    ...
  },
  JNS => {
    ...
  },
  _ => (),
}

Different conditional jumps care about different flags.

JE_JUMP | JLE_JUMP | JBE_JUMP => {
  // JLE also has SF<>OF as a condition but we don't handle OF currently.
  // JBE also has CF=1 as a condition but we don't handle CF currently.
  if flag_register_is_set("ZF", &flag_registers) {
      jump_happens = true;
  }
},

JS_JUMP => {
    if flag_register_is_set("SF", &flag_registers) {
        jump_happens = true;
    }
},

JNE_JUMP => {
    if !flag_register_is_set("ZF", &flag_registers) {
        jump_happens = true;
    }
},

JNS => {
    if !flag_register_is_set("SF", &flag_registers) {
        jump_happens = true;
    }
},

If none of those branches get hit, we do nothing and jump_happens stays as false.

After all of this, we just check if jump_happens is true and we modify the instruction pointer accordingly.

if jump_happens {
    let offset = twos_complement(second_byte) as usize;
    // We might need to add logic in case the jump is forwards but
    // that was not in the assignment so I'm not going to worry about that yet.
    instruction_pointer -= offset;
}

twos_complement is a function that gets the negative representation of a binary value.

It first flips all the bits in a byte by using the ! operator.

It then adds one to the resulting value, ignoring overflows.

pub fn twos_complement(num: u8) -> i8 {
    (!num).wrapping_add(1) as i8
}

If we run the program now, the instruction pointer gets modified correctly but it results in an infinite loop because we are never exiting out of the main loop because instruction_pointer gets decremented and our main loop only exits if instruction_pointer > binary_contents.length .

In the branch where we set a flag in the FLAGS register, we check to see if the value we’re passing to the set_flags function is larger or equal to 0.

if !instruction_is_conditional_jump(instruction) {
	if mnemonic != "mov" {
	  if reg_is_dest && instruction != ImmediateToRegisterMemory {
	      ...
	  } else {
	      let rm = get_register_state(&rm_register, &registers);
	      if rm.updated_value >= 0 {
	          set_flags(rm.updated_value, &mut flag_registers, is_word_size);
	      } else {
                // I'm not positive if this is the correct move here
                // but it's easier to modify in the future rather than think about
                // all the possible places where it's not the right thing.
                return 
	      }
	  }
	} else {
	  ...
	}
}

We do this because the control flow of set_flags eventually calls number_is_signed which then calls get_highest_bit .

get_highest_bit requires the value to not be negative because its casting the type into an usize to be able to do bit shifts with it.

fn get_highest_bit(value: i64, is_word_size: bool) -> usize {
    assert!(value >= 0, "get_highest_bit() - Value {} is negative, we thought we didn't have to handle this but now we do.", value);

    if is_word_size {
        return (value >> 15) as usize;
    } else {
        return (value >> 7) as usize;
    }
}

Now, if the else branch we actually clear the flags register.

if !instruction_is_conditional_jump(instruction) {
  if mnemonic != "mov" {
      if reg_is_dest && instruction != ImmediateToRegisterMemory {
          ...
      } else {
          ...
      }
  } else {
      // We don't clear if it's a conditional jump because the jnz conditional jump for example relies on the flags to know when to stop jumping.
      clear_flags_registers(&mut flag_registers);
  }
}

The output now becomes:

mov cx, 3 | 0 -> 3 | flags: [], IP: 0 -> 3
mov bx, 1000 | 0 -> 1000 | flags: [], IP: 3 -> 6
add bx, 10 | 1000 -> 1010 | flags: [], IP: 6 -> 9
sub cx, 1 | 3 -> 2 | flags: [], IP: 9 -> 12
jnz 248 | 0 -> 0 | flags: [], IP: 12 -> 14
add bx, 10 | 1010 -> 1020 | flags: [], IP: 6 -> 9
sub cx, 1 | 2 -> 1 | flags: [], IP: 9 -> 12
jnz 248 | 0 -> 0 | flags: [], IP: 12 -> 14
add bx, 10 | 1020 -> 1030 | flags: [], IP: 6 -> 9
sub cx, 1 | 1 -> 0 | flags: ["ZF"], IP: 9 -> 12
jnz 248 | 0 -> 0 | flags: [], IP: 12 -> 14

As you can see, in the end the jump no longer happens because the result of sub cx, 1 becomes 0 so the Zero Flag is now set and Jump Not Zero (jnz) no longer gets fulfilled because the result is 0.

The jnz prints out the address as 248 but this has the same binary representation as the two’s complement one which is -8. I won’t bother printing it the “correct” way.

Thank you for reading.