I. Putting the "Mental" in "Fundamentals"

Welcome to this blog about writing a BIOS for PlayStation 2 emulators in Rust.

By studying this process, you should get a greater appreciation of how much effort goes on behind the scenes to boot your computer.

Legals

This book is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The source code is licensed under the terms of the GNU General Public License version 3.0 or (at your option) any later version.

The PS2 architecture

The PlayStation 2 is an unusually laid out computer compared to the Intel x86 PC or ARM phone/tablet you're probably reading this on.

It has two CPUs in it, the R5900 contained in the Emotion Engine chip (this chip contains a lot of other processors, which is why I will call the CPU itself the R5900), and the R3051 contained in the Input/Output Processor. "Emotion Engine" and "Input/Output Processor" are quite long names, so I will call them the "EE" and "IOP" respectively.

It also has a custom GPU called the Graphics Synthesizer, which I will call the "GS".

These are connected together like this:

High level diagram showing the EE and IOP connected, and the EE and GS connected

These CPUs both use the MIPS instruction set, though the IOP uses 32-bit MIPS I, and the EE uses 64-bit MIPS III.

The EE is significantly faster than the IOP, so the IOP is used to offload slow tasks like input/output, and then notify the EE when something has happened through a communication link.

Each of these chips has its own embedded memory; the EE has 32 MiB of system memory, the IOP has its own 2 MiB of memory, and the GS has 4 MiB of embedded memory.

The boot process

The PS2 BIOS boot process at a very high level like this:

  • Both CPUs start from same BIOS ROM.
  • Figure out if you are the EE (Emotion Engine CPU) or the IOP (Input/Output Processor CPU).
  • If you are the EE:
    • Load and run the EE kernel.
    • Set up the processor and memory.
    • Set up the EE side of the communication link.
    • Synchronise with the IOP through it.
  • If you are the IOP:
    • Load and run the IOP kernel.
    • Set up the processor and memory.
    • Set up the IOP side of the communication link.
    • Synchronise with the EE through it.
  • When both CPUs are set up and ready:
    • Play a pretty logo.
    • Check if there is a disc in the drive.
    • If there is, do something reasonable about it:
      • Run a PlayStation 2 game on the EE.
      • Run a PlayStation 1 game on the IOP.
      • Play a DVD or CD.
      • Complain about an unrecognised disc.
    • If there isn't, load the BIOS interface.

And all of this in 4 megabytes of ROM. Quite impressive, isn't it?

Now, since we are running on emulators, we can remove parts of this: people will watch DVDs and CDs with their media player of choice, and use a dedicated PlayStation 1 emulator for PS1 games. That gives us a little extra room for debugging or fancy graphics if we desire.

Some bad news

As of time of writing, LLVM - the code generator behind rustc - does not support the MIPS I instruction set, which the IOP uses. This means you can't use Rust on the IOP at present, unless you use MIPS II, which is a superset of the MIPS I instruction set. This carries risks of your code randomly breaking because LLVM decided to use an instruction not supported by the IOP, which I decided not to bother with. Still, I will document what the IOP Rust code would look like, if it had native support.

Equally, the EE is a quirky chip which LLVM does not support directly, because it uses 64-bit pointers, while the EE only has a 32-bit address space. Fortunately, we can pretend that the EE is a 32-bit MIPS II CPU, which is supported by LLVM, and this is what we will do.

I. Targets: Ready... Aim...

If we tried to compile something right now, Rust would probably spit out an x86 ELF/Mach-O/PE executable. It wouldn't run for a few reasons:

  • The PlayStation 2 doesn't understand these formats; it'd just try to execute it as a binary blob and trip up on their magic numbers.
  • Even if it did understand these files, they would be targeted for the wrong architecture.

Rust needs to be told how to emit code for the PS2. For that we need to define a target file.

Target files

rustc includes its own target files for each architecture; you can look at the available targets with the following command:

rustc --print target-list

Each target file is in a JSON format, that you can inspect with the following command:

rustc -Z unstable-options --print target-spec-json --target $target

For the EE, we'll start off the mipsel-unknown-linux-gnu target, which looks like this:

{
  "arch": "mips",
  "cpu": "mips32r2",
  "data-layout": "e-m:m-p:32:32-i8:8:32-i16:16:32-i64:64-n32-S64",
  "dynamic-linking": true,
  "env": "gnu",
  "executables": true,
  "features": "+mips32r2,+fpxx,+nooddspreg",
  "has-elf-tls": true,
  "has-rpath": true,
  "is-builtin": true,
  "linker-flavor": "gcc",
  "linker-is-gnu": true,
  "llvm-target": "mipsel-unknown-linux-gnu",
  "max-atomic-width": 32,
  "os": "linux",
  "position-independent-executables": true,
  "pre-link-args": {
    "gcc": [
      "-Wl,--as-needed",
      "-Wl,-z,noexecstack"
    ]
  },
  "relro-level": "full",
  "target-c-int-width": "32",
  "target-endian": "little",
  "target-family": "unix",
  "target-pointer-width": "32",
  "vendor": "unknown"
}

This contains a lot of unnecessary (and inaccurate) things, such as this target being for MIPS32r2. Let's change it a bit.

{
  "arch": "mips",
  "cpu": "mips2",
  "data-layout": "e-m:m-p:32:32-i8:8:32-i16:16:32-i64:64-n32-S64",
  "dynamic-linking": false,
  "executables": true,
  "features": "+mips2",
  "linker": "mipsel-none-elf-ld",
  "linker-flavor": "ld",
  "llvm-target": "mipsel-none-elf",
  "llvm-args": "-mxgot",
  "max-atomic-width": 32,
  "os": "none",
  "panic-strategy": "abort",
  "position-independent-executables": false,
  "relro-level": "full",
  "soft-float": true,
  "target-c-int-width": "32",
  "target-endian": "little",
  "target-family": "unix",
  "target-pointer-width": "32",
  "vendor": "unknown"
}

Some important changes:

  • "cpu": "mips2" - We need LLVM to target the MIPS II instruction set.
  • "soft-float": true - The R5900 has a single-float FPU, which LLVM has quite a few bugs with, so we pretend it doesn't have one to work around them.
  • "linker": "mipsel-none-elf-ld"/"linker-flavor": "ld" - We will need to use the GNU linker to build this, because LLD seems to have a nasty habit of optimising out our code.

The correct settings here would be "cpu": "mips3" for the R5900 and "cpu": "mips1" for the R3051, but as mentioned previously, LLVM support for these needs to mature.

I will refer to this target file as ee.json, and you should put it in your crate/workspace root.

Building cross binutils

This isn't directly Rust related, but we need a linker for our code, and binutils has proven to be very reliable in my experiments. One day, I hope LLD is stable enough to use.

You'll need a GNU-compatible host C compiler (gcc/clang will do fine, but not MSVC++), and a copy of the binutils source. I'm using binutils 2.31.

After extracting your source, you can build it with a standard-ish method:

mkdir build
cd build
../configure --target="mipsel-none-elf" 
make
sudo make install

And then you can test it installed correctly by running mipsel-none-elf-ld --version.

One final thing

To get rustc to build for a native target, we use cargo build; but Cargo doesn't currently work well with cross-compilation, because it expects the various libraries to be already installed.

This may change with std-aware Cargo.

We can get around this through the cargo-xbuild wrapper, which you can grab with a simple cargo install cargo-xbuild. This allows you to build your code with cargo xbuild --target ee.json, and also wraps Clippy.

Don't forget the .json for --target; I had some problems where it would build your code fine without it (i.e. --target ee), but fail to build any library crates your code depended on.

II. Wellington Bootloader

This chapter to written to show you what would be possible with compiler support. However, it does talk about the architecture of MIPS, and you'll need it later.

Remember the steps to boot the console? Let's put those into action.

"Both CPUs start from the same BIOS ROM"

Both CPUs start at the same fixed address in the virtual memory space: BFC0'0000, the start of BIOS ROM.

Both CPUs effectively mask the virtual address with 1FFF'FFFF. This results in BFC0'0000 being mapped to 1FC0'0000 by both chips; the two addresses are interchangeable and reference the same data. As you will see later, BFC0'0000 seems to have practical problems in binutils (it seems like the pointer is treated as signed, leading to strange behaviour), so I will often refer to 1FC0'0000, which does not have these issues.

Also worth mentioning is that in MIPS, everything with the virtual 8000'0000 bit set is kernel memory space and everything without it set is user memory. This is a mostly theoretical distinction when working with a bare metal MIPS target like the PlayStation 2 though, because the console is almost always in kernel mode when playing a game.

"Figure out if you are the EE or IOP"

MIPS has a 4 coprocessor interface baked into the ISA; this provides a standard method of accessing custom features of each chip without needing a new assembler for each of them. These coprocessors were left defined but unspecified, however MIPS has conventions for them:

  • Coprocessor 0 (COP0) is the system control coprocessor; essentially a set of registers containing processor state. Because it contains such important information, it is mandatory and found in all MIPS processors.
  • Coprocessor 1 (COP1) is the floating point unit, and all floating point math goes through it. The IOP does not have a floating point unit, and the EE's floating point unit is very nonstandard, which is why our code doesn't use them.
  • Coprocessor 2 (COP2) is left for custom accelerators, and both the EE and IOP use them.
  • Coprocessor 3 was originally for more custom accelerators, but it got repurposed into more floating point operations. Neither CPU has this coprocessor.

We will need COP0 for this, and it too has conventions for register names and contents, although not specifically what the register contains. The specific register we need is COP0 register 15, which has the mnemonic "PRid" for "Processor Identification".

The PRid register looks like ths in both CPUs:

[fancy diagram showing the least significant byte being marked "revision number" and the second least significant byte being marked "model number"; the other bytes are "reserved"]

On the EE, the model number is 0x2E, while on the IOP the model number is 0x00 (it was a much earlier core), which means we just need to check what the model number field is and jump to the appropriate function.

To get a register from coprocessor 0, we use mfc0 <dest reg> <cop0 reg>.

So we could write a function that looks like this:


# #![allow(unused_variables)]
#![no_std]
#![no_main]
#![feature(asm)]

#fn main() {
mod cop0 {
    pub fn prid() -> u32 {
        let prid;
        unsafe { asm!("mfc0 $0, $$15" : "=r" (prid)) };
        prid
    }
}

fn ee_setup() -> ! {
    unimplemented!("EE code goes here");
}

fn iop_setup() -> ! {
    unimplemented!("IOP code goes here");
}

fn _start() -> ! {
    let prid = cop0::prid();
    let model = prid & 0xFF00;

    match model {
        0x2E00 => ee_setup(),
        0x0000 => iop_setup(),
        _ => unimplemented!("Couldn't detect host processor"),
    }
}

#}

If we could compile code for the IOP, anyway. Note that it won't compile on the Rust Playground because the Playground runs on x86.

"Load and run the EE/IOP kernel"

Here's where things become a little painful.

II. EE Booting: An Emotional Experience

The EE kernel gets a megabyte of reserved memory - specifically from 0000'0000 to 0010'0000. In that megabyte, you need to store the library of system calls that the kernel provides, and act on outside events.

To start the EE kernel we need to first load it from ROM, putting specific parts of the ROM in particular places in RAM.

If only there was a convenient, standard format for loading data into a specific location in memory.

Magical elves

Fortunately, there is: the Executable and Linkable Format, or ELF.

The following paragraph is a lie that I will change at some point: I will need to discuss how to parse ELF, because the IOP kernel depends on it.

I'm not going to go into much detail about how to implement ELF loading, because the specification is easy enough to understand, plus there are guides for writing homebrew ELF loaders, and even crates for it.

But all of these must get their ELF data from somewhere.

The naive solution is to append the ELF to the ROM at a fixed address. A more efficient format called "ROMDIR" will be discussed later; PCSX2 requires that format to recognise your ROM as valid. Fortunately, DobieStation is not as picky, and we will use that for testing.

Stick to the script

The ELF that you compile needs certain functions at specific addresses to handle MIPS exceptions. I will explain them later, but for now, your kernel should leave the area from 0000'0000 until 0000'0280 clean. To do this, we need to tell the linker not to put data there through a linker script.

We also need to do this for the bootloader, before you ask.

A linker script contains two main parts: we need to tell the linker where we start executing code from, and where to put code/data.

We tell the linker where to execute code by telling it which symbol to treat as the start of the program. This is the START(<symbol>) command.

We tell the linker where to put code/data using the SECTIONS command. SECTIONS is a block of, well, program sections, such as .text (your code), .data (global variables) and .bss (zeroed global variables, taking up no binary space).

The easiest solution is to just tell the linker to offset your code by 0x280 bytes.

/* Set the start point to _start */
START(_start);

/* The sections of the program */
SECTIONS {
    /* "section : address" means "start section at address" */
    .text : 0x00000000 {
        /* 
         * "." refers to the current memory pointer. In this case, ". = foo" sets the current
         * memory pointer to `foo + address` (see above).
         */
        . = 0x280;

        /* Then include all symbols in .text and its subsections. */
        *(.text .text.*);
    }

    /* Without the address, the linker just aligns it after the end of the previous section. */
    .data : {
        *(.data .data.*);
    }

    .bss : {
        *(.bss .bss.*);
    }
}

And then we can use this for a very, very simplistic program.


# #![allow(unused_variables)]
#![no_std]
#![no_main]

#fn main() {
#[no_mangle]
fn _start() -> ! {
    loop {}
}

#}

II. IOP Booting: Hell on Earth

This page is a work in progress; I'm not happy with it so far, but the information has to be put down somewhere before it can be made pretty.

The IOP kernel has a unique, modular architecture, based around relocatable ELF modules called "IOP Relocatable Executables", or "IRX"es. This makes it microkernel-esque, but with no userspace.

So, you could use your ELF parser for the bootloader for the IOP too, right?

Not without accomodating the quirks of the IRX format.

The .iopmod section

First of all, the IRX uses a header that a common sense ELF parser would reject as invalid (possibly intentionally): it uses the "Processor Specific" region of ELF types, as opposed to the standardised formats between 1 and 4. This custom header used to detect an IRX file.

Each IRX has a specific section - .iopmod (section number 0x70000080) - which contains an IRX's metadata, which looks like this:


# #![allow(unused_variables)]
#fn main() {
/// `.iopmod` section
#[repr(C)]
pub struct Metadata {
    /// "module structure" pointer
    module: usize,
    /// Start offset, relative to the beginning of the executable.
    start: usize,
    /// Heap start
    heap: usize,
    /// Text section size
    text_size: usize,
    /// Data section size
    data_size: usize,
    /// BSS section size
    bss_size: usize,
    /// Major/minor version in binary coded decimal, e.g. 0x0102 for 1.2.
    version: u32,
    /// Module name
    name: [u8; 8],
}

/// The IOP module metadata.
///
/// The 0xDEADBEEF magic numbers indicate data fields that will be changed after compile.
#[link_section = ".iopmod"]
static IOPMOD: Metadata = Metadata {
    module: 0xDEADBEEF,
    start: 0xDEADBEEF,
    heap: 0xDEADBEEF,
    text_size: 0xDEADBEEF,
    data_size: 0xDEADBEEF,
    bss_size: 0xDEADBEEF,
    version: 0x0100,
    name: *b"Example\0",
};
#}

Searching for this data requires combing through the ELF section table until you find an entry with the name .iopmod. If you don't find this entry, it's probably an invalid IRX.

The IRX export table

IRX modules contain an export table, which lists the functions that the IRX module provides. This table looks like this:


# #![allow(unused_variables)]
#fn main() {
/// An IRX export table.
#[repr(C)]
struct Export {
    /// Magic number 0x41c0'0000, used for recognising the export table.
    magic: u32,
    /// Always zero. If this isn't zero, it's possibly a false positive.
    zero: u32,
    /// Version in binary-coded decimal.
    version: u32,
    /// Name of this module.
    name: [u8; 8],
    /// Offsets of exported functions, terminated with a zero reference.
    exports: [usize],
}
#}

Searching for the export table involves searching for the export table magic number 41C0'0000 (chosen because it isn't a valid MIPS instruction), and then parsing the table as above.

I've encoded the export number into the struct, but I'm not sure how to parse a table into this.

The IRX import table

IRX modules can contain arbitrarily many module import tables, which list the numbered functions the module requires. This table looks like this:


# #![allow(unused_variables)]
#fn main() {
/// An IRX function stub.
#[repr(C)]
struct FunctionStub {
    /// Jump instruction.
    jump: u32,
    /// Function number.
    func: u32,
}

/// An IRX import table.
#[repr(C)]
struct Import {
    /// Magic number 0x41e0'0000, used for recognising the import table.
    magic: u32,
    /// Always zero. If this isn't zero, it's possibly a false positive.
    zero: u32,
    /// Version of the module in binary-coded decimal.
    version: u32,
    /// Name of the module.
    name: [u8; 8],
    /// Imported function stub, followed by an all-zero stub.
    stubs: [FunctionStub],
}
#}

Each stub is a very minimal two-instruction "do nothing" function that looks like this in the assembly:

03e00008        jr      $ra             # Return to caller
240000NN        li      $zero,NN        # Write to an always-zero register the function reference.

li is actually a pseudo-instruction. The actual instruction there is addiu $zero, $zero, NN, but adding zero to an number is the same as putting that number in the destination register.

For this section you will need to know the encodings of the j, jr and addiu instructions, which are:

[fancy diagram marked J - jump to address with the leftmost six bits as 000010, and the other 26 bits marked as "absolute address"]

Since each MIPS instruction is four-byte aligned, the address is right-shifted by two bits, giving a total of a 2^28 byte jump address.

As an example, to jump to the address 0321'1234, you shift right the address by two bits to get 000C'848D, AND the address with 07FF'FFFF to clear the six most significant bits, and then OR in the six most significant bits of the j opcode (0800'0000) to produce 080C'848D.

[fancy diagram marked JR - jump to register with the leftmost six bits all zero, the next five bits marked as "source register", the next 15 bits all zero, and the rightmost six bits as 001000]

[fancy diagram marked ADDIU - add immediate without overflow with the leftmost six bits as 001001, the next five bits marked as "source register", the next five bits marked as "destination register", and the 16 rightmost bits marked as "signed immediate"]

When an IRX is loaded into memory, you will need to overwrite the jr $ra stubs with j <addr> instructions. This means you will need to keep track of the function addresses, or alternatively look them up again after storing the module start and end addresses.

The index of the function address is given in the least significant byte of the following li $zero, NN instruction, for the module listed in the import table's module name.

I'm well aware this is quite messy and possibly explained badly.

III. Every Rule Has An Exception...

This post is kind of an information dump; it'll be needed for the next chapter when we actually write some exception handlers.

When something sufficiently unusual happens, a processor will raise an exception for the kernel to deal with. Each architecture handles them differently; Philipp Opperman has an excellent post about how x86 handles exceptions through its Interrupt Descriptor Table, and the Embedded Rust Book has a section about using the cortex-m crate family for handling ARM exceptions.

Both x86 and ARM use tables of function pointers at a fixed location in memory, with additional bells and whistles for x86 such as interrupt stacks. MIPS takes a different approach, which simplifies processor exception handling, but makes software a bit more complex.

In MIPS, you get a 32 instruction area to store an exception handler. The specific location and handler types depend on the processor, but they are located near the beginning of ROM or RAM, depending on a configuration bit.

This limited space means your handler will usually use a jump table to handle exceptions, and the coprocessor registers are designed with this in mind: the exception code in COP0 register 13 occupies bits 7 to 2, which makes loading the relevant offset from a table of addresses a simple AND instruction.

Additionally, MIPS uses a dedicated handler for a commonly occurring exception - "TLB Miss", where the processor doesn't know how to map a virtual memory address to physical memory address - which speeds up exception handling in that situation.

Exception handling process

When an exception occurs:

  • The processor switches to kernel mode.
  • The exception code is written to part of COP0 register 13 (the exception cause register, or Cause).
  • The current program counter is written to COP0 register 14 (the exception program counter, or EPC). If the exception happened in a branch delay slot (very rare), the previous instruction program counter is written instead and a branch delay bit is set in Cause.
  • If the exception is related to memory, the address that caused it is written to COP0 register 8 (the bad virtual address register, or BadVAddr).
  • The processor then jumps to a fixed address in memory that depends on the exception and chip, and starts executing code there.

Additionally:

  • The EE sets an exception indicator bit in COP0 register 12 (processor status, or Status).
  • The EE has multiple levels of exceptions: "level 1 exceptions" are the ones we're going to talk about, but there are also "level 2 exceptions", which include processor reset, non-maskable interrupts, performance counter overflow and debug exceptions.
  • The IOP has a 3-level stack of interrupt/mode state. When an exception occurs, the current state is pushed to the stack, and a kernel mode, interrupt disabled state is pushed. At the end of a exception handler, the interrupt/mode state is popped from the stack, restoring it to the state before the exception.

The only level 2 exception you need to care about is the Reset exception, and that's simply when your code starts executing, so you handle it anyway. The other three are reserved mostly for the PlayStation 2 development console, called the TOOL, where it would be useful to examine memory at a particular point in the program.

Note that the processor does not save register state for you; you must do this yourself. For this purpose, MIPS ABIs reserve registers $k0 and $k1 for kernel exception bootstrapping. I suggest putting the kernel stack pointer in $k0, and using $k1 as a scratch register.

Got all that? No? I'll keep going then.

Exception codes

Speaking of those exception codes, here they are (for both CPUs):

  • 0: Processor Interrupt (we'll cover these next chapter)
  • 1: TLB Modified (*)
  • 2: TLB Miss (Load) / TLB Invalid (Load) (*/**)
  • 3: TLB Miss (Store) / TLB Invalid (Store) (*/**)
  • 4: Address Error (Load)
  • 5: Address Error (Store)
  • 6: Bus Error (Instruction)
  • 7: Bus Error (Data)
  • 8: System Call (SYSCALL instruction)
  • 9: Breakpoint (BREAK instruction)
  • 10: Reserved Instruction
  • 11: Coprocessor Unusable
  • 12: Arithmetic Overflow
  • 13: Trap

*: The TLB is not emulated by either PCSX2 or DobieStation, so you can safely stub them. **: TLB Miss exceptions go in their own handler to differentiate them from the others.

Unlike x86, MIPS - at least the versions of MIPS we're using - has no double fault handler, so if you cause an exception in an exception handler, the processor will invoke the relevant exception handler again. If that's because you caused a bus error in the bus error exception handler, your code will infinitely loop. Be careful.

Other MIPS processors would have an exception code for floating point exceptions, but the IOP does not have a floating point unit, and the EE's floating point unit does not raise exceptions.

Exception handler addresses

Where these exception handlers go depends on the processor, and on a bit in Status called "Bootstrap Exception Vectors" (BEV) which is used for exception handlers in the ROM.

I will use the physical address conventions for these memory addresses. Remember that 0000'0000 is the start of RAM, and 1FC0'0000 is the start of ROM.

For the IOP:

  • TLB Miss exceptions go to 1FC0'0100 in BEV mode, or 0000'0000 normally.
  • All other exceptions go to 1FC0'0180 in BEV mode, or 0000'0080 normally.

For the EE:

  • TLB Miss exceptions go to 1FC0'0200 in BEV mode, or 0000'0000 normally.
  • Performance Counter Overflow exceptions go to 1FC0'0280 in BEV mode, or 0000'0080 normally.
  • Debug exceptions go to 1FC0'0300 in BEV mode, or 0000'0100 normally.
  • Interrupt exceptions go to 1FC0'0400 in BEV mode, or 0000'0200 normally.
  • All other exceptions go to 1FC0'0380 in BEV mode, or 0000'0180 normally.

You may note that the EE's ROM exception handlers conveniently occur after the IOP's ROM exception handlers. It's one of the (few) advantages of the EE being a custom CPU.