This week is considering three things:
- Checking out KVM and working with it on x64
- Setting up the Raspberry PI workspace
- Hello world in RPI.
KVM – x64
While searching the interwebs, I followed this (amazing?) tutorial that explains how the KVM API for Linux works.
I created a C project and followed it up, and this is the basic breakdown:
- Open the KVM API.
- Create a VM
- Query and assert that your device supports all the APIs you will call.
- Create a vCPU (The thing that will run your code)
- Create memory mappings
- Put some code in, set the vCPU registers
- Run the vCPU
What did I make it run? 2+2=4 of course. Here’s the program.
const uint8_t code[] = {
0xba, 0xf8, 0x03, /* mov $0x3f8, %dx */
0x00, 0xd8, /* add %bl, %al */
0x04, '0', /* add $'0', %al */
0xee, /* out %al, (%dx) */
0xb0, '\n', /* mov $'\n', %al */
0xee, /* out %al, (%dx) */
0xf4, /* hlt */
};
Here, the sum of registers a
and b
is output to the serial output at 0x3f8.
The fun part is that in KVM, output is a special exit, and KVM yielded us control back.
switch (vcpuKvmRun->exit_reason) {
case KVM_EXIT_IO:
if (vcpuKvmRun->io.direction == KVM_EXIT_IO_OUT &&
vcpuKvmRun->io.size == 1 && vcpuKvmRun->io.port == 0x3f8 &&
vcpuKvmRun->io.count == 1)
putchar(
*(((char*)vcpuKvmRun) + vcpuKvmRun->io.data_offset));
...
Summarizing what happens above, we “emulate” serial output device, by printing out what the VM wants to be printed, onto the console.
Cool, it works. We’re ready to start porting this onto the RPI.
Wait, how do I work with a RPI PI Zero 2 W?
Device setup
I had just bought this device. So let’s see what we need.
- Some way to work with the project.
- Running and testing on the RPI.
To run the project, I thought the easiest way would be to cross-compiling. So let’s get cross-compiling setup in Fedora.
And checking on Google, there’s a gcc arm64 cross compiler package:
Well, that’s a no then. Its not possible to cross compile actual userspace executables for ARM64 on x64 machines.
Then, let’s run it on the PI itself.
So, I decided that using some kind of remote development server is good. I tried VSCode, and the poor PI Zero 2 W collapsed given the weight of the remote server.
Well, then zed comes to the rescue. With a bit of tweaking, I connect to the PI over Tailscale, and open it using zed. Perfect.
I’m missing out on debugging, which will make it a pain, but that’s okay. Been through worse. Maybe it’ll finally push me to learn gdb
.
Getting to run the Thing
Once that’s in and all things are done, we get to the first bit – compile errors all over the place. Some things are wrong.
Since the KVM API is different, the register names and the structures are different. Now it won’t compile.
KVM_GET_ALL_REGS
and the corresponding don’t work. For some reason, its required to use KVM_SET_ONE_REG
(and the corresponding individual get register).
Okay, all fixed.
There’s still a few things to fix. Namely, starting the CPU in 32 bit mode, and setting the initial PC and code.
Starting in 32 bit
The Cortex A53 starts in 64 bit as it is… a 64 bit processor. How do I instruct it to start in 32 bit mode? Rummaging through the KVM docs (Ctrl+F for “32-bit”), turns out before starting the CPU, I also need to initialize it with featuresets.
Specifically, we need to do this bit:
struct kvm_vcpu_init cpuInit = {};
int preferredTarget = ioctl(vmFd, KVM_ARM_PREFERRED_TARGET, &cpuInit);
cpuInit.features[0] |= 1 << KVM_ARM_VCPU_EL1_32BIT;
int vcpuInitResult = ioctl(this->fd, KVM_ARM_VCPU_INIT, &cpuInit);
Why is it features[0]
? I’m not sure I want to know why its addressed that way after spending way too much time trying to find out. The initialization also gives a bunch of useful features to initialize based on what’s required for us – I don’t want them as of now.
Now that the CPU is ready, something needs to be there to execute code.
Executable code
The days of writing code in assembly is gone (for the regular Software Engineer), but now, we need it again. Regular assemblers are too complex (for me as of now) to generate only the tiny snippets I need. This awesome online tool – “Online ARM to HEX Converter” does it all online. I put a small piece of code and let’s see what it does:
Wonderful – something. So I put this in an array and hope it works.
After all this, it still didn’t work (hint: endianness).
I got an MMIO, but was confused what it meant so I added this hideous snippet:
for (int r = 0; r < 16; r++){
printf("Got Registers: %d(%ld)\n", r,
getRegisterValue(gameboyKvmVM.vcpuFd, r));
}
We get this sad output and the program hangs; and I have no idea what’s going on:
Hello, from advpi!
Attempted mmio
Got Registers: 0(0)
Got Registers: 1(5000)
Got Registers: 2(6000)
Got Registers: 3(0)
Got Registers: 4(25)
Got Registers: 5(0)
Got Registers: 6(4108)
Got Registers: 7(0)
Got Registers: 8(0)
Got Registers: 9(0)
Got Registers: 10(0)
Got Registers: 11(0)
Got Registers: 12(0)
Got Registers: 13(0)
Got Registers: 14(0)
Got Registers: 15(0)
So the registers were loaded correctly – and the code here was executed! But then R15 is 0? and the next LDR also didn’t generate an MMIO. Strange.
What’s next
As the week comes to a close, I’m getting a bit annoyed at managing resources, and would like (at least a bit of) help from the language. I’ve been coding in a “write the code then pray it works” – and have been ignoring cleanup, refactoring and reading the code again. Let’s take the time to do that, fix any issues. Hopefully by then I fix any faults in the program by cleaning up, or atleast learn more by then so I can.