Categories
Website

Vitals Recorder – For the sick days

Since I joined uni in London, I’ve fell sick a few times.
I have this habit of logging my temperature (something from my family) so its easy to report to a GP, and easy to remember when you last had medicine.

I have been using Google Keep as this scrongled way of logging my temperature, but that seems a bit inconvenient; So I look around Google Play Store — and nothing much (in 2019).

Then I remember this isn’t my first rodeo with fevers, and I had been working on vitals-recorder to log temperatures.

Here it is – https://bd.kekvrose.me

It was incomplete, with quite unusable issues, so I went on to complete it. Atleast the smallest bit of it. Try it out maybe!

What’s the thing?

Well, its to track your temperature, as and when you need. It can also log BP, sugar levels, SPO2 (in case you have that kind of fever) and a small description. I usually just log any pills I’ve taken at that time.

Very simple to add and edit. Plus, if you don’t like Fahrenheit, you can seamlessly use Celcius instead.

As visible, you can also export this data out as JSON, if you decide you want to use something else. The format is not too complicated to wrangle, and its documented here – Items.ts. The JSON is an array of items.

Completely offline

Well, the best part is this “app” is that its a PWA. This means it will work offline from the time you first open the website, and you can “install” it on your phone/desktop and it will function similar to a “native” application.

Not the best name but hey it works.

The data’s all stored locally (IndexedDB) and never leaves your device. In fact, since its completely local, there’s no server side processing and its hosted completely on Github Pages.

Summing it up

Well, here’s a nifty tool in case you want to log any vitals. Feedback is always great to have, so do try it out and let me know!

(Psst: Link is at the beginning of the post)

Categories
advpi

advpi – Week 8 – MMIO

After a(nother) long break, I have decided to work in small features. This week was mostly adding MMIO handlers, ARM test code compiling and changing out logging.

MMIO

Well, it was already there, but I didn’t have any easy way of using it. I added a modular system to register and track MMIO request handlers, and then each handler, can implement its own logic for handling any MMIO request!

So what does this mean?

We can now delegate parts of MMIO to other bits of code to make this happen.

Code building

Getting tired of writing in ones and zeros (what a long forgotten issue) — I turned to assembling code.

A quirk of the process is that I need raw code, as a binary blob, and not the ELF binary that as (the GNU Assembler) provides.

This led me to a well-accepted solution from the interwebs:

as one.S -o one.intermediate.bin
objcopy -O binary one.intermediate.bin one.bin

But then, I saw this wonderful issue: where I realised that the as installed in my RPi was for ARM8, and for 64 bits. The easiest way to turn to 32-bit compilation was (arguably not easier, but less of a headache) to install the gcc-arm-none-eabi package, and use its as.

So with that out of the way, we can quickly create up an example program, and get:

mov r0, #0x10000
mov r1,#0x1000
mov r2,#0x4500
ldr r3,[r2]
add r1,r2,r3
str r1,[r2]
WFI

I’m not sure how to get a HALT or equivalent out of the machine, but I get these wonderful logs if I run the machine twice (two exits from MMIO; one for the ldr and one for the str):

Yay! MMIO works, and the logging handler receives it.

Logging

As evident above, working with logs is getting a bit tedious, especially from using cout and printf, so I linked up this wonderful library spdlog to handle the logging.

So with this, I can tweak the log levels to see the data I want, and more importantly, the logging api to define them is much cleaner.

Summary

There’s a lot more things to do, but I will probably have to read more to understand which section can be tackled next. It might be a good idea to have a look at the Timers and Interrupts before going onto other topics.

Categories
advpi

advpi – Week 7 – UI

Well, that was a long break.

As a way to keep myself in track, I’ll create logs as I’m working on things. That way, I can be more regular with work. Anyways, this week will be getting a GUI.

GUI – getting a screen.

Well, here’s the plan. I’ll have to create the PPU (The GBA’s equivalent of a GPU and the CPU timers), so I can get an output. For that, I need a window.

SDL2 to the rescue. I’ve only used SDL2 in Rust (ref: porcel8).

Seems it was pretty easy to get it integrated after installing the SDL libraries.

find_package(SDL2 REQUIRED CONFIG REQUIRED COMPONENTS SDL2)

// link the library to our executable
target_link_libraries(advpi PRIVATE SDL2::SDL2)

And it builds successfully!

The screen

With some X11 forwarding, and adding some code to create a screen on output, we get an output!

Of course, for those of us using Fedora, which uses Wayland, XWayland needs to be enabled, and then on the , X11 forwarding needs to be enabled.

# Set this in /etc/ssh/sshd_config
X11Forwarding yes

Once that’s enabled, its just ssh-ing into the system, and then seeing a wonderful screen

ssh -X geemax # geemax is my test Pi

And, window!

Short end of a week

That’s it for now, I’m afraid.

In retrospect, working with the PI is a bit annoying and I’d like a totally local setup, and also decide between working on the PPU or the timers, whichever gets me to the boot splash screen faster!

Categories
Software

BytePusher – A Gentle Shove into Emulating

Emulators have always felt fascinating to me, and while trying to find ways to learn about them, I chanced upon BytePusher – an esolang by Javamannen. Let’s talk a bit about the Bytepusher emulator I built.

What’s Bytepusher?

tl;dr – Bytepusher is a specification for a single instruction set computer with a limited input, 2D graphics and audio.

It has one instruction, and runs like a champ.

Emulator firsts

Well, its a virtual machine to be honest, but I assume there’s enough in the specification to make it a complete phyiscal machine.

This was one of my first works in Rust, and it has a single thread system that processes the instruction, then goes to render the graphics and audio.

One of the trickier things is that the audio queue should be as small as possible, but yet can’t be empty. If empty, you have stutters, or otherwise, you have high latency. For now, I synchronise with the audio queue to maintain an expected speed of execution.

What does it look like?

We get audio, input output and graphics!

Check out the work here! I apologise but I only have linux builds available, executables from the other platform can be built from source.

Categories
advpi

advpi – Week 6 – Code Execution

This week was hyper-focused on getting normal, proper code execution.

Code layout

The first thing to get right was the code execution.

I found this wonderful project kvm-arm32-on-arm64 that provided a style for doing this.

So, it turns out that ARM converter prints the code in the format its in memory. So, to write it correct, I used the big endian format and then stored it in code in the uint32_t format. This way, the compiler will set it to little endian format as required, and I don’t need to think about endianness for a while.

So if I take this bit of ARM code, that loops,

nop
nop
nop
mov r0,0x0
mov r1,0x1000
mov r2,0x1
add r3,r1,r2
str r3,[r1]
mov r15,0x2000000

We convert it to the following array.

uint32_t CODE[] = {
    0xE320F000,
    0xE320F000,
    0xE320F000,
    0xE3A00000,
    0xE3A01A01,
    0xE3A02001,
    0xE0813002,
    0xE5813000,
    0xE3A0F402,
};

I would expect it to exit from MMIO (as it attempts to write to a read only page) at 0x1000, and exit per the program. If it loops or does anything else, something is wrong.

Thankfully, we get this wonderful(ly confusing) output:

Hello, from advpi!
Opened bios
Attempted mmio
Attempted write=yes of value=4097 and at address=4096
Register(0)=0
Register(1)=0
Register(2)=0
Register(3)=0
Register(4)=0
Register(5)=0
Register(6)=0
Register(7)=0
Register(8)=0
Register(9)=0
Register(10)=0
Register(11)=0
Register(12)=0
Register(13)=0
Register(14)=0
Register(15)=0
Closing the Virtual Machine

So, the MMIO output was correct, but we’re not able to see the registers. I’m thinking it is more related to the exception levels and register banks that are available, but I shall check more on it as required.

So although technically we can’t see the registers, we can get some rudimentary output through MMIO, or shared memory.

If I remove the MMIO causing statement, we get an endless loop – showing that we’re stopping as required.

What’s next

This week was short as well. Following this, I’ll be setting up a GBA cart, and the video device. I’m thinking of using SDL2 to show a video device and handle device I/O.

Categories
advpi

advpi – Week 3,4,5 – Mapping the BIOS and shifting to C++

This week was a non-amusing transfer from C to C++ and adding the GBA BIOS.

BIOS Map

The Nintendo BIOS contains the basic code needed to initalize and use the GBA. After that the BIOS hands over control to the game that’s plugged in. My initial thought was to load the file into the game, then I remembered the golden word which haunted me last week – mmap. It was used to create the memory which is mapped to the guest VM.

So, I mmap-ed the BIOS file into the memory, then added it in.

void* biosRom =
        mmap(0, BIOS_SIZE, PROT_READ | PROT_EXEC, MAP_SHARED, biosFd, 0);

Now its upto the kernel to manage it, and not my headache (for now).

Then I tried mapping the BIOS into the memory of the VM, and it segfaulted instantly. I changed some of the parameters to allow writes, and then it didn’t segfault, but the mmap didn’t go through.

After looking into the documentation, here’s the mistake I was mapping.

    struct kvm_userspace_memory_region memory_region = {
        .slot = 0,
        .userspace_addr = (unsigned long long)gbaMemory->onboardMemory,
        .guest_phys_addr = 0x02000000,
        .memory_size = ONBOARD_MEM_SIZE};

I was setting the slot to 0 for both the onboard memory (or where I was putting my code), and the BIOS page. They are actually different slots of memory, and need to be initialized as separate slots.

C++

I’m not used to C (and KVM and any of this) and it shows, so its probably a good idea to shift to C++ while I still can.

I wanted exceptions to handle unexpected failures, but I also didn’t want to model for releasing resources – a job better left for the compiler. C++ seems to be a better idea. I removed all the GOTOs, and got around to C++, then finally put an exception that helped me with my sanity.

class InitializationError : public std::exception {
    private:
    std::string message;
    public:
    InitializationError(std::string);
};

Being able to use exceptions along with constructors is useful. I’m aware that there’s a performance penalty, but it should be fine as long as I spend minimal time processing in my code (the kernel handles running the guest VM, not my code).

What’s next

The registers are still seemingly useful and garbage at the same time, but we shall see. Next week will mostly be travel and continued refactoring while I try to learn more, so next week will be week 5 essentially.

Categories
advpi

advpi – Week 2 – Initialization and Code

This week is considering three things:

  1. Checking out KVM and working with it on x64
  2. Setting up the Raspberry PI workspace
  3. Hello world in RPI.

KVM – x64

While searching the interwebs, I followed this (amazing?) tutorial that explains how the KVM API for Linux works.

I created a C project and followed it up, and this is the basic breakdown:

  1. Open the KVM API.
  2. Create a VM
  3. Query and assert that your device supports all the APIs you will call.
  4. Create a vCPU (The thing that will run your code)
  5. Create memory mappings
  6. Put some code in, set the vCPU registers
  7. Run the vCPU

What did I make it run? 2+2=4 of course. Here’s the program.

const uint8_t code[] = {
    0xba, 0xf8, 0x03, /* mov $0x3f8, %dx */
    0x00, 0xd8,       /* add %bl, %al */
    0x04, '0',        /* add $'0', %al */
    0xee,             /* out %al, (%dx) */
    0xb0, '\n',       /* mov $'\n', %al */
    0xee,             /* out %al, (%dx) */
    0xf4,             /* hlt */
};

Here, the sum of registers a and b is output to the serial output at 0x3f8.

The fun part is that in KVM, output is a special exit, and KVM yielded us control back.

switch (vcpuKvmRun->exit_reason) {
    case KVM_EXIT_IO:
                if (vcpuKvmRun->io.direction == KVM_EXIT_IO_OUT &&
                    vcpuKvmRun->io.size == 1 && vcpuKvmRun->io.port == 0x3f8 &&
                    vcpuKvmRun->io.count == 1)
                    putchar(
                        *(((char*)vcpuKvmRun) + vcpuKvmRun->io.data_offset));
...

Summarizing what happens above, we “emulate” serial output device, by printing out what the VM wants to be printed, onto the console.

Cool, it works. We’re ready to start porting this onto the RPI.

Wait, how do I work with a RPI PI Zero 2 W?

Device setup

I had just bought this device. So let’s see what we need.

  1. Some way to work with the project.
  2. Running and testing on the RPI.

To run the project, I thought the easiest way would be to cross-compiling. So let’s get cross-compiling setup in Fedora.

And checking on Google, there’s a gcc arm64 cross compiler package:

Well, that’s a no then. Its not possible to cross compile actual userspace executables for ARM64 on x64 machines.

Then, let’s run it on the PI itself.

So, I decided that using some kind of remote development server is good. I tried VSCode, and the poor PI Zero 2 W collapsed given the weight of the remote server.

Well, then zed comes to the rescue. With a bit of tweaking, I connect to the PI over Tailscale, and open it using zed. Perfect.

I’m missing out on debugging, which will make it a pain, but that’s okay. Been through worse. Maybe it’ll finally push me to learn gdb.

Getting to run the Thing

Once that’s in and all things are done, we get to the first bit – compile errors all over the place. Some things are wrong.

Since the KVM API is different, the register names and the structures are different. Now it won’t compile.

KVM_GET_ALL_REGS and the corresponding don’t work. For some reason, its required to use KVM_SET_ONE_REG (and the corresponding individual get register).

Okay, all fixed.

There’s still a few things to fix. Namely, starting the CPU in 32 bit mode, and setting the initial PC and code.

Starting in 32 bit

The Cortex A53 starts in 64 bit as it is… a 64 bit processor. How do I instruct it to start in 32 bit mode? Rummaging through the KVM docs (Ctrl+F for “32-bit”), turns out before starting the CPU, I also need to initialize it with featuresets.

Specifically, we need to do this bit:

struct kvm_vcpu_init cpuInit = {};
    int preferredTarget = ioctl(vmFd, KVM_ARM_PREFERRED_TARGET, &cpuInit);
    cpuInit.features[0] |= 1 << KVM_ARM_VCPU_EL1_32BIT;
    int vcpuInitResult = ioctl(this->fd, KVM_ARM_VCPU_INIT, &cpuInit);

Why is it features[0]? I’m not sure I want to know why its addressed that way after spending way too much time trying to find out. The initialization also gives a bunch of useful features to initialize based on what’s required for us – I don’t want them as of now.

Now that the CPU is ready, something needs to be there to execute code.

Executable code

The days of writing code in assembly is gone (for the regular Software Engineer), but now, we need it again. Regular assemblers are too complex (for me as of now) to generate only the tiny snippets I need. This awesome online tool – “Online ARM to HEX Converter” does it all online. I put a small piece of code and let’s see what it does:

Wonderful – something. So I put this in an array and hope it works.

After all this, it still didn’t work (hint: endianness).

I got an MMIO, but was confused what it meant so I added this hideous snippet:

for (int r = 0; r < 16; r++){
  printf("Got Registers: %d(%ld)\n", r,
    getRegisterValue(gameboyKvmVM.vcpuFd, r));
}

We get this sad output and the program hangs; and I have no idea what’s going on:

Hello, from advpi!
Attempted mmio
Got Registers: 0(0)
Got Registers: 1(5000)
Got Registers: 2(6000)
Got Registers: 3(0)
Got Registers: 4(25)
Got Registers: 5(0)
Got Registers: 6(4108)
Got Registers: 7(0)
Got Registers: 8(0)
Got Registers: 9(0)
Got Registers: 10(0)
Got Registers: 11(0)
Got Registers: 12(0)
Got Registers: 13(0)
Got Registers: 14(0)
Got Registers: 15(0)

So the registers were loaded correctly – and the code here was executed! But then R15 is 0? and the next LDR also didn’t generate an MMIO. Strange.

What’s next

As the week comes to a close, I’m getting a bit annoyed at managing resources, and would like (at least a bit of) help from the language. I’ve been coding in a “write the code then pray it works” – and have been ignoring cleanup, refactoring and reading the code again. Let’s take the time to do that, fix any issues. Hopefully by then I fix any faults in the program by cleaning up, or atleast learn more by then so I can.

Categories
advpi Software

advpi – Week 1 – Intro

Why?

I recently bought a Nintendo DS, and a Raspberry PI. Turns out the Nintendo DS can run Game Boy Advance (GBA) games, because they have the same CPU architecture.

NDS and a RPI Zero 2W

Well, the Raspberry PI isn’t too far off too. Does it mean I can just run Gameboy Advance Games on the RPI too?

Well, that’s what this series will go to show. This week is about understanding feasibility and what I will be using (and why). The later weeks should explore things in a bit more technical detail.

Can it GameBoy Advance?

Let’s see what we need, to run it on the Raspberry Pi Zero 2 W. I’ve chosen it because its very small, but runs the same chip as the RPi 1.

This beautiful article “Game Boy Advance Architecture” by Rodrigo Copetti explains what’s required for a GBA, but, let’s break it down and compare it to the RPI Zero 2 W.

Also, let’s think about the weird stuff like Display, Audio and Input later. Being able to run stuff is more important.

GBA Architecture – Rodrigo Copetti

So, let’s go by this order:

  1. CPU
  2. Memory
  3. I/O
CPU

The Zero 2 W has CPU with 4 Cortex A53 cores, whereas the GBA used a ARM7TDMI chip. The A53 might be ancient, but the ARM7TDMI is comparatively prehistoric.

But, delving into the A53 Technical Reference Manual, it is entirely backwards compatible with the ARM7TDMI! The AArch32 instruction set, seems to be a renamed edition of what the older GBA CPU used to use.

Awesome, so the RPI CPU can natively run all these instructions, no problems.

Memory
Memory – Rodrigo Copetti

Oof, it gets complicated here. There’s few things:

  1. The AGB RAMs – They’re basically two different RAM chips, one smaller (32KB) and faster chip, and one larger (96KB) and slower chip.
  2. The even slower (but massive) EWRAM
  3. GamePAK memory – the Game Data

All of these items are laid out in the memory address space for the GBA CPU to access.

The GBA CPU accesses data through 32-bit addresses, and there’s a full address-layout map for fetching and storing data into these addresses.

Here’s the memory map from the no$gba documentation. (Ignore the Display Memory and I/O registers for now)

Intuitively, what this means is when the CPU tries to load data from 0x2000000, it loads the first byte of data from On-board WRAM. If it tries to use 0x3000000, it takes the first byte of data from the On-Chip Work RAM.

The RPI, unfortunately, has none of these mappings. It does however, has 512MB of RAM, which completely runs circles around what the GBA has for memory. Perhaps if there’s a way to create or simulate these mappings for the CPU to use?

I/O

We ignored the Display and IO memory mapping then, so we can discuss about it now. The display and the I/O is all memory-mapped. This means the data is directly available and usable via the memory.

The RPI, again, has none of this I/O. But it does have a lot of GPIO pins and display capabilities already, perhaps it can be “massaged” into the right shape and plugged into the GBA memory layout?

The initial hypothesis – Virtualization

Well, we had three finding:

  • The GBA and RPI can both execute ARM code
  • They differ in IO.
  • They differ in memory mapping.

Well, in these cases, people usually use virtualization. Hmm… That seems awfully inefficient. I might as well use an existing emulator then, where it’ll also replicate the behaviour of the CPU. Although…

Hardware-Assisted Virtualization

Hardware assisted virtualization is a featureset of newer CPUs, where the CPUs can run guest code directly, skipping the need to emulate CPUs in software. Intel provides HAXM, AMD provides AMD-V, and ARM provides… well, something similar, but not sure what the name is. Let’s see how to use it in practi-

Terror

Looking into how to use it. I found – Trapping and Emulation of Instructions and ARM Virtualization Overview.

How I felt reading those documents

No, I can’t understand any of that. Understandable.

Looking online how it works, I remembered/found this neat thing – KVM. Linux takes all these features, and abstracts these architectures into one API, the KVM API. This allows Linux Hosts to run Guest machines with Hardware Accelerated execution.

KVM allows setting memory spaces, setting device specifications and runs code on a virtualized CPU that runs on the machine using the Virtualization Extensions on the CPU. Anything devices need virtualization? KVM gives control back to you to implement it yourself.

What’s better, the RPI Zero 2 W supports KVM.

$ ls /dev/kvm

crw-rw---- 1 root kvm 10, 232 Dec 7 19:29 /dev/kvm

The Final Problem Statement

So, let’s distill all of this into two questions:

  1. Can the GBA be run as a virtual machine on the Raspberry PI Zero 2 W?
  2. Is there any benefit to virtualization, instead of complete emulation?

Thus, the project advpi is planned to run GBA code on the PI Zero 2 W, by using a virtual machine.

The IO will need to be adapted and emulated by the software as well.

Display, can be showed on a window on-screen, and the sound can be played on the pi as it has sound capabilities as well (given a speaker is attached to it).

Anything else, will be figured out as it comes. Let’s see how that goes.

What’s next week?

  • Setting up a workspace on a RPI.
  • Testing the KVM API to run some x86 code, as it is well-documented.
Categories
Personal

Hello world!

Hello Everyone! I will be posting various things that I am working on in this blog.

My older blogs content for HSQL has been moved over to here as an archive.

Categories
HSQL

Week 12: Dedup & Documentation (& Examples)

This week focused on finishing up documentation, a little bit of cleanup and one final task of implementation that would be helpful.

DISTINCT

The SELECT DISTINCT clause currently uses a DEDUP(x,ALL), which is notoriously slow for large tasks. Instead an alternative clause was suggested

TABLE(x,{<cols>},<cols>,MERGE);

//eg. for a t1 table with c1 and c2
TABLE(t1,{t1},c1,c2,MERGE);

This looks good!

But, the question is – how to get the columns c1 and c2 for the table t1? It may not always be known, or worse, the known type information may not be complete. Here, we are presented with two ways in which we can proceed with this information:

  1. Stick with DEDUP all the time. It is slow, but it will work.
  2. Use the new TABLE format, falling back to the DEDUP variant if type information given is not sufficient.

This is great, and it also incentives typing out ECL declarations, but it still feels as a compromise.

Template Language

Everything changed when I realised that ECL has a wonderful templating language. For a quick idea on what the point of a template language is, it can be used in a way similar to the preprocessor directives in C – macros that can be used to write ECL.

So, what can we do here? The good thing about writing macros is that since they are based off the same solution, the macro processor can work with data types in ECL very well, and also, can make ECL code.

So, can we write an expression that creates the c1,c2 expression, given that the table t1 is given?

__dedupParser(la):= functionmacro
    #EXPORTXML(layoutelements,la);
    #declare(a)
    #set(a,'')
    #for(layoutelements)
        #for(field)
            #append(a,',');
            #append(a,%'{@label}'%);
            #end
        return %'a'%;
    #end
endmacro

Although I won’t go into the details of this interesting function macro (A macro whose contents are scoped), in essence, it can take a record, and put out a snippet of code that contains the columns delimited by comma.

Using __dedupParser

Although the naming isn’t accurate, we can inspect what the macro does by an example

Given a record R1, which contains two fields, f1 and f2 (Assume both as integer), then __dedupparser(r1) will create an ECL code snippet of “f1,f2,” (Notice the trailing comma). This works nicely with the RECORDOF declaration, which can help get the record associated with a recordset definition.

So, this brings something really useful, as we can now have this general code syntax –

TABLE(<t1>,{ <t1> }#expand(__dedupParser(RECORDOF( <t1> ))),MERGE);

This simple expression generalizes the entire TABLE function pretty effectively.

Adding this into code, it is important to remember that the function macro needs to be inserted separately, and most importantly, only once in a ECL file. This is better done by tracking whether a DISTINCT clause has been used in the program (Using an action like we had done earlier), and inserting the functionmacro definition at the top in this case. And with this, a good new improvement can be made to the SELECT’s DISTINCT clause.

Distinct now has shinier code!

This should perform much better, and work more efficiently.

Documentation

So far, some of the syntax was stored and referred personally by the use of notes, memory and grammar. Writing this down in essential as a way of keeping a record (More for others, and even more importantly, for yourself). So, keeping a document to denote the syntax available to the user is rather important.

Docs for OUTPUT!

Here, its important to lay out the syntax, but also present some usable examples that an explain what is going on with that code segment. (Yeap, for every single statement!)

Winding up!

With documentation, that ends a lot of the work that was there. In fact, its already Week 12, although it is still surprising how quickly time has passed during this period. My code thankfully doesn’t need extensive submission procedures as I was working on the dev branch, and work will continue on this “foundation”. It has been a wonderful 12 weeks of a learning and working process for me, and I would like to thank everybody at the HPCC Systems team for making it such an enjoyable process! Although this is all for now, I can only hope that there’s more to come!