Heads up!
Work in progress
I sometimes stream the development here
This all started when I decided to make my own language. Again.
I wanted to make a rust-like language with an effect system and some more features. You can see it on github.
I've tried many approaches, but settled on tokenizing text → parsing it into an AST → building an Intermediate Representation (IR) from this AST → converting this IR into raw bytes → packaging those bytes into an application like ELF or PE.
The first problem I encountered with the backend itself is compiling to x86_64.
That's the first architecture I've tried to implement and already got stuck.
Turns out that x86_64 instruction set is really difficult (THIS IS ALL ABOUT
LEGACY and BACKWARDS COMPATIBILITY!!!). It has some weird things like REX (register extensions)
or some shortcuts (For example add AL, 5
is taking 2 bytes (encoded as 04 05
) when
add BL, 5
takes 3 bytes (encoded as 80 c3 05
))
I have found this great page as reference for x86_64 opcodes: X86-64 Instruction Encoding (OSDev Wiki is basically the same page) This table also helped me a lot.
When I did some first steps with x86_64, I had to pack it into some sort of executable, which turned out a big challange. I couldn't find any rust crates that will write elf executables, only ones that write relocatibles and I want to integrate a linker into my backend. This video helped me A LOT with reading/writing elf files. But be aware: it has some wrong types! For example (the only one I found) sizes in program header. Check wikipedia for that!
When I've implemented a minimal functionality for my elf crate, I've tried to compile a simple program and got segmentation fault. This video helped me finally figure it out. Essentially entry point has some strict rules. Finally, I released this crate: orecc-elf
Started writing: Sep 30, 2023
Last edit: Jun 18, 2024