Categories
advpi

advpi – Week 6 – Code Execution

This week was hyper-focused on getting normal, proper code execution.

Code layout

The first thing to get right was the code execution.

I found this wonderful project kvm-arm32-on-arm64 that provided a style for doing this.

So, it turns out that ARM converter prints the code in the format its in memory. So, to write it correct, I used the big endian format and then stored it in code in the uint32_t format. This way, the compiler will set it to little endian format as required, and I don’t need to think about endianness for a while.

So if I take this bit of ARM code, that loops,

nop
nop
nop
mov r0,0x0
mov r1,0x1000
mov r2,0x1
add r3,r1,r2
str r3,[r1]
mov r15,0x2000000

We convert it to the following array.

uint32_t CODE[] = {
    0xE320F000,
    0xE320F000,
    0xE320F000,
    0xE3A00000,
    0xE3A01A01,
    0xE3A02001,
    0xE0813002,
    0xE5813000,
    0xE3A0F402,
};

I would expect it to exit from MMIO (as it attempts to write to a read only page) at 0x1000, and exit per the program. If it loops or does anything else, something is wrong.

Thankfully, we get this wonderful(ly confusing) output:

Hello, from advpi!
Opened bios
Attempted mmio
Attempted write=yes of value=4097 and at address=4096
Register(0)=0
Register(1)=0
Register(2)=0
Register(3)=0
Register(4)=0
Register(5)=0
Register(6)=0
Register(7)=0
Register(8)=0
Register(9)=0
Register(10)=0
Register(11)=0
Register(12)=0
Register(13)=0
Register(14)=0
Register(15)=0
Closing the Virtual Machine

So, the MMIO output was correct, but we’re not able to see the registers. I’m thinking it is more related to the exception levels and register banks that are available, but I shall check more on it as required.

So although technically we can’t see the registers, we can get some rudimentary output through MMIO, or shared memory.

If I remove the MMIO causing statement, we get an endless loop – showing that we’re stopping as required.

What’s next

This week was short as well. Following this, I’ll be setting up a GBA cart, and the video device. I’m thinking of using SDL2 to show a video device and handle device I/O.

Categories
advpi

advpi – Week 3,4,5 – Mapping the BIOS and shifting to C++

This week was a non-amusing transfer from C to C++ and adding the GBA BIOS.

BIOS Map

The Nintendo BIOS contains the basic code needed to initalize and use the GBA. After that the BIOS hands over control to the game that’s plugged in. My initial thought was to load the file into the game, then I remembered the golden word which haunted me last week – mmap. It was used to create the memory which is mapped to the guest VM.

So, I mmap-ed the BIOS file into the memory, then added it in.

void* biosRom =
        mmap(0, BIOS_SIZE, PROT_READ | PROT_EXEC, MAP_SHARED, biosFd, 0);

Now its upto the kernel to manage it, and not my headache (for now).

Then I tried mapping the BIOS into the memory of the VM, and it segfaulted instantly. I changed some of the parameters to allow writes, and then it didn’t segfault, but the mmap didn’t go through.

After looking into the documentation, here’s the mistake I was mapping.

    struct kvm_userspace_memory_region memory_region = {
        .slot = 0,
        .userspace_addr = (unsigned long long)gbaMemory->onboardMemory,
        .guest_phys_addr = 0x02000000,
        .memory_size = ONBOARD_MEM_SIZE};

I was setting the slot to 0 for both the onboard memory (or where I was putting my code), and the BIOS page. They are actually different slots of memory, and need to be initialized as separate slots.

C++

I’m not used to C (and KVM and any of this) and it shows, so its probably a good idea to shift to C++ while I still can.

I wanted exceptions to handle unexpected failures, but I also didn’t want to model for releasing resources – a job better left for the compiler. C++ seems to be a better idea. I removed all the GOTOs, and got around to C++, then finally put an exception that helped me with my sanity.

class InitializationError : public std::exception {
    private:
    std::string message;
    public:
    InitializationError(std::string);
};

Being able to use exceptions along with constructors is useful. I’m aware that there’s a performance penalty, but it should be fine as long as I spend minimal time processing in my code (the kernel handles running the guest VM, not my code).

What’s next

The registers are still seemingly useful and garbage at the same time, but we shall see. Next week will mostly be travel and continued refactoring while I try to learn more, so next week will be week 5 essentially.

Categories
advpi

advpi – Week 2 – Initialization and Code

This week is considering three things:

  1. Checking out KVM and working with it on x64
  2. Setting up the Raspberry PI workspace
  3. Hello world in RPI.

KVM – x64

While searching the interwebs, I followed this (amazing?) tutorial that explains how the KVM API for Linux works.

I created a C project and followed it up, and this is the basic breakdown:

  1. Open the KVM API.
  2. Create a VM
  3. Query and assert that your device supports all the APIs you will call.
  4. Create a vCPU (The thing that will run your code)
  5. Create memory mappings
  6. Put some code in, set the vCPU registers
  7. Run the vCPU

What did I make it run? 2+2=4 of course. Here’s the program.

const uint8_t code[] = {
    0xba, 0xf8, 0x03, /* mov $0x3f8, %dx */
    0x00, 0xd8,       /* add %bl, %al */
    0x04, '0',        /* add $'0', %al */
    0xee,             /* out %al, (%dx) */
    0xb0, '\n',       /* mov $'\n', %al */
    0xee,             /* out %al, (%dx) */
    0xf4,             /* hlt */
};

Here, the sum of registers a and b is output to the serial output at 0x3f8.

The fun part is that in KVM, output is a special exit, and KVM yielded us control back.

switch (vcpuKvmRun->exit_reason) {
    case KVM_EXIT_IO:
                if (vcpuKvmRun->io.direction == KVM_EXIT_IO_OUT &&
                    vcpuKvmRun->io.size == 1 && vcpuKvmRun->io.port == 0x3f8 &&
                    vcpuKvmRun->io.count == 1)
                    putchar(
                        *(((char*)vcpuKvmRun) + vcpuKvmRun->io.data_offset));
...

Summarizing what happens above, we “emulate” serial output device, by printing out what the VM wants to be printed, onto the console.

Cool, it works. We’re ready to start porting this onto the RPI.

Wait, how do I work with a RPI PI Zero 2 W?

Device setup

I had just bought this device. So let’s see what we need.

  1. Some way to work with the project.
  2. Running and testing on the RPI.

To run the project, I thought the easiest way would be to cross-compiling. So let’s get cross-compiling setup in Fedora.

And checking on Google, there’s a gcc arm64 cross compiler package:

Well, that’s a no then. Its not possible to cross compile actual userspace executables for ARM64 on x64 machines.

Then, let’s run it on the PI itself.

So, I decided that using some kind of remote development server is good. I tried VSCode, and the poor PI Zero 2 W collapsed given the weight of the remote server.

Well, then zed comes to the rescue. With a bit of tweaking, I connect to the PI over Tailscale, and open it using zed. Perfect.

I’m missing out on debugging, which will make it a pain, but that’s okay. Been through worse. Maybe it’ll finally push me to learn gdb.

Getting to run the Thing

Once that’s in and all things are done, we get to the first bit – compile errors all over the place. Some things are wrong.

Since the KVM API is different, the register names and the structures are different. Now it won’t compile.

KVM_GET_ALL_REGS and the corresponding don’t work. For some reason, its required to use KVM_SET_ONE_REG (and the corresponding individual get register).

Okay, all fixed.

There’s still a few things to fix. Namely, starting the CPU in 32 bit mode, and setting the initial PC and code.

Starting in 32 bit

The Cortex A53 starts in 64 bit as it is… a 64 bit processor. How do I instruct it to start in 32 bit mode? Rummaging through the KVM docs (Ctrl+F for “32-bit”), turns out before starting the CPU, I also need to initialize it with featuresets.

Specifically, we need to do this bit:

struct kvm_vcpu_init cpuInit = {};
    int preferredTarget = ioctl(vmFd, KVM_ARM_PREFERRED_TARGET, &cpuInit);
    cpuInit.features[0] |= 1 << KVM_ARM_VCPU_EL1_32BIT;
    int vcpuInitResult = ioctl(this->fd, KVM_ARM_VCPU_INIT, &cpuInit);

Why is it features[0]? I’m not sure I want to know why its addressed that way after spending way too much time trying to find out. The initialization also gives a bunch of useful features to initialize based on what’s required for us – I don’t want them as of now.

Now that the CPU is ready, something needs to be there to execute code.

Executable code

The days of writing code in assembly is gone (for the regular Software Engineer), but now, we need it again. Regular assemblers are too complex (for me as of now) to generate only the tiny snippets I need. This awesome online tool – “Online ARM to HEX Converter” does it all online. I put a small piece of code and let’s see what it does:

Wonderful – something. So I put this in an array and hope it works.

After all this, it still didn’t work (hint: endianness).

I got an MMIO, but was confused what it meant so I added this hideous snippet:

for (int r = 0; r < 16; r++){
  printf("Got Registers: %d(%ld)\n", r,
    getRegisterValue(gameboyKvmVM.vcpuFd, r));
}

We get this sad output and the program hangs; and I have no idea what’s going on:

Hello, from advpi!
Attempted mmio
Got Registers: 0(0)
Got Registers: 1(5000)
Got Registers: 2(6000)
Got Registers: 3(0)
Got Registers: 4(25)
Got Registers: 5(0)
Got Registers: 6(4108)
Got Registers: 7(0)
Got Registers: 8(0)
Got Registers: 9(0)
Got Registers: 10(0)
Got Registers: 11(0)
Got Registers: 12(0)
Got Registers: 13(0)
Got Registers: 14(0)
Got Registers: 15(0)

So the registers were loaded correctly – and the code here was executed! But then R15 is 0? and the next LDR also didn’t generate an MMIO. Strange.

What’s next

As the week comes to a close, I’m getting a bit annoyed at managing resources, and would like (at least a bit of) help from the language. I’ve been coding in a “write the code then pray it works” – and have been ignoring cleanup, refactoring and reading the code again. Let’s take the time to do that, fix any issues. Hopefully by then I fix any faults in the program by cleaning up, or atleast learn more by then so I can.