Page Tables

OS in Rust

Announcements

  • Action Items:
    • Welcome back.
    • Weird 2 week split on paging and page tables.
    • No HW last week, no lab this week.

Citations

Intro

Recall

  • Be thinking about the malloc assignment.

Recall

  • We motivated paging by comparing it with segmentation.
  • We introduced the multi-level page table.
  • We found out that the bootloader already set up a page table.
  • We now make page table frames accessible to our kernel.

Crates for days

  • To implement the approach, we will need support from the bootloader, so we’ll configure it first.
  • Afterward, we will implement a function that traverses the page table hierarchy in order to translate virtual to physical addresses.
  • Finally, we learn how to create new mappings in the page tables and how to find unused memory frames for creating new page tables.

Accessing Page Tables

  • Accessing the page tables from our kernel is non-trivial.
  • Examine the example 4-level page table hierarchy.
  • Visually to follow.

Visually

Description

  • Keys are virtual memory
  • Levels are radices of virtual memory numerical addresses
  • Values are physical addresses (also numerical)
    • In non-leafs, they are addresses of other page tables.

Problem

  • We can’t directly access physical addresses from our kernel
    • Our kernel uses virtual addressing
  • Say we want to access physical 4 KiB
    • It is located not (at probability \(\lim_{x\to} p = 0\)) virtual 4 KiB
    • So we have no existing way to dereference a pointer to a physical address

Identity Mapping

  • Wait why are virtual and physical addresses every different anyways.
    • You know the answer to this, but for now we pretend we don’t.
  • We can identity map all page table addresses
    • We can do this because we, the OS, load the page tables.

Visually

Pros & Cons

  • The physical addresses of page tables are also valid virtual addresses.
    • Easily access the page tables of all levels starting from the CR3 register.
  • Clutters the virtual address space.
    • More difficult to find continuous memory regions of larger sizes.

Example

  • Need 1000 KiB sized region.
  • Start at 28 KiB collides with mapped page 1004 KiB.
  • First option is 1008 KiB.

  • Similar to fragmentation problem.

Another Con

  • Restricts where page tables can be placed.
    • Potential conflicts with both physicals and virtual addresses.
    • Boo!

Example

  • Say 1000 KiB sized region at virtual 1008 KiB.
  • Can’t use any frame with a physical address between 1000 KiB and 2008 KiB

Options

Fixed offset

  • Instead, use a separate memory region for page table mappings.
    • Just increment page table’s virtual address by the size of physical memory, perhaps.
    • Or by one billion terabytes.
    • Doesn’t matter the offset!

Visually

This works

  • In x86_64 since the 48-bit address space is 256 TiB, a size sometimes referred to as “large”.
    • 48 bits
  • So using e.g. 10 TiB - no problem.
  • Also, some clients may but I don’t have that much storage.
    • So - no collisions!

Limitations

  • We need to create a new mapping whenever we create a new page table.
    • So, somehow also tracking what transform a collection of tables uses.
  • Doesn’t allow accessing page tables of other address spaces
    • Mostly useful when creating a new process.

Options

Full Map

  • What about mapping the complete physical memory?
    • That is, not only the page tables.
  • Allows our kernel to access arbitrary physical memory, including page tables.
  • Reserved virtual memory range has the same size as before
    • But no longer contains unmapped pages

Visually

Hang on

  • Doesn’t that… defeat the purpose?
    • We no longer achieve memory isolation.
    • We no longer can use “local” addresses.
    • We “need” to store page tables somewhere.

Shout out x86_64

  • x86_64 can address ENORMOUS pages, up to size 2 MiB.
    • “ENORMOUS”
  • This cuts down on overhead.
  • Recall: Setting addressable sizes for malloc.
    • There: words vs. bytes.
  • Throwback time!

TLB

  • We recall the translation lookaside buffer or TLB.
  • From here
    • Systems in Rust week 9, on the stack.

The MMU uses the TLB, a small, fast cache, to store recent translations.

  • Bigger pages gives less TLB utilization gives faster lookup.

Think, Pair, Share

  • Turn to your nearest fellow human and describe what the TLB is, as if you were asked during a job interview.

Options

Temporary Mapping

  • With sufficiently little memory, say enough to fit in a non-radixed page table, an OS can utilize a temporary map.

Visually

Explanation

  • The table controls the first 2 MiB of addresses.
  • It is reachable from CR3 by following zero indexes at higher radix page tables.
  • Then, index 8 maps 32 KiB (virtual) to 32 KiB (physical)
  • This identity maps the page table itself.

Capabilities

  • Given typical page table sizes, this allows up to \(2^9 - 1 = 511\) mappings.
  • We showed two:
    • By mapping the 0th entry of the level 1 table to the frame with address 24 KiB, it created a temporary mapping of the virtual page at 0 KiB to the physical frame of the level 2 page table

Capabilities

  • Given typical page table sizes, this allows up to \(2^9 - 1 = 511\) mappings.
  • We showed two:
    • By mapping the 9th entry of the level 1 table to the frame with address 4 KiB, it created a temporary mapping of the virtual page at 36 KiB to the physical frame of the level 4 page table

Use

  • Now the kernel can access the level 2 page table by writing to page 0 KiB, and…
  • Access the the level 4 page table by writing to page 36 KiB.

Process

  • Search for a free entry in the identity-mapped level 1 table.
  • Map that entry to the physical frame of the page table that we want to access.
  • Access the target frame through the virtual page that maps to the entry.
  • Set the entry back to unused, thereby removing the temporary mapping again.

Pros & Cons

  • Reuses a mere 512 virtual pages.
  • Require only 4 KiB of physical memory.
  • However, cumbersome.
    • A new mapping might require modifications to multiple table levels

Options

Recursion

  • Folks, it’s back.
  • I never thought I’d see recursion used in a recursive data structure like a radix tree.
  • By the way, have any of you written more than 2-3 for loops this term?

Map oneself

  • Allow a page table to map itself.
  • Require no new tables, consume only a single “slot” (mapping)
  • Roundabout way of reserving virtual memory.
  • We follow an example.

Visually

First steps

  • We begin with the original example for today, but…
  • Add a 511 to 4 KiB map.

First steps

  • If followed, the CPU finds the same table again.
  • But! The CPU assumes a change-of-levels.
  • Tables have a level-agnostic layout on x86_64.

Trickery

  • This shorten the number of levels that the CPU traverses.
  • Following the recursive entry changes the perceived radix each time.
  • So we can treat the level 2 table as a level 1 table and the level 1 table as the mapped frame.
  • We can now read and write page tables because the CPU thinks that it is the mapped frame.

Recurse!

  • Note the recursive “Step 1”

Recurse!!

  • Or recurse twice
    • Modify a higher radix page table

Recurse!!!

  • Or recurse thrice
    • Running out of radices pretty quick here.

Learn more

  • This would’ve been a great lab but we have SSRD.
  • Real ones will do recursive page table address calculation.
  • Blog
Note

Left as an exercise to the interested student.

Pros & Cons

  • Uses too much virtual memory (512 GiB).
    • Irrelevant in practice.
  • It only allows accessing the currently active address space easily.
    • Wait is this a con?
  • It heavily relies on the page table format of x86.
    • To my knowledge, recursive data structures are usually implemented recursively…

Options

Bootloading

Cheating

  • Each of these approaches requires write access to the page table and fine-grained control of physical memory.
  • We can’t create these required mappings without an existing way to access the page tables.

Crates for eons

  • We “need” the ““help”” of the bootloader.
    • Which stole the privilege of writing page tables from us!
  • The bootloader has access to the page tables, so it can create any mappings that we need.
  • We use “cargo features” (I’m not linking that) to do so.

Two options

  • The map_physical_memory feature maps the complete physical memory somewhere into the virtual address space.
  • With the recursive_page_table feature, the bootloader maps an entry of the level 4 page table recursively.

Maps

  • My two greatest joys - key-value storage and recursion - cannot both be appeased at once.
  • I use map_physical_memory and shed but a single (1) tear for the recursion lost.
Cargo.toml
[dependencies]
bootloader = { version = "0.9", features = ["map_physical_memory"]}

Onward!

  • Forthwith the bootloader maps the complete physical memory to some unused virtual address range.
  • To communicate the virtual address range to our kernel, the bootloader passes a boot information structure.

Passes (as an argument)

  • Probably a good time to check your work.
    • Make your own (informed) decision about whether to e.g. initialize the IDT before or after printing this.
src/main.rs
#[unsafe(no_mangle)]
pub extern "C" fn _start(boot_info: &'static bootloader::BootInfo) -> ! {
    println!("{:?}", boot_info);
}

What I got

  • Not super human readable, but not bad either.
BootInfo { memory_map: [MemoryRegion { range: FrameRange(0x0..0x1000), region_type: FrameZero }, MemoryRegion { range: FrameRange(0x1000..0x5000), region_type: PageTable }, MemoryRegion { range: FrameRange(0x5000..0x15000), region_type: Bootloader }, MemoryRegion { range: FrameRange(0x15000..0x16000), region_type: BootInfo }, MemoryRegion { range: FrameRange(0x16000..0x1d000), region_type: Kernel }, MemoryRegion { range: FrameRange(0x1d000..0x9f000), region_type: KernelStack }, MemoryRegion { range: FrameRange(0x9f000..0xa0000), region_type: Reserved }, MemoryRegion { range: FrameRange(0xf0000..0x100000), region_type: Reserved }, MemoryRegion { range: FrameRange(0x100000..0x27e000), region_type: KernelStack }, MemoryRegion { range: FrameRange(0x27e000..0x400000), region_type: Usable }, MemoryRegion { range: FrameRange(0x400000..0x431000), region_type: Kernel }, MemoryRegion { range: FrameRange(0x431000..0x83d000), region_type: PageTable }, MemoryRegion { range: FrameRange(0x83d000..0x7fe0000), region_type: Usable }, MemoryRegion { range: FrameRange(0x7fe0000..0x8000000), region_type: Reserved }, MemoryRegion { range: FrameRange(0xfffc0000..0x100000000), region_type: Reserved }, MemoryRegion { range: FrameRange(0xfd00000000..0x10000000000), region_type: Reserved }], physical_memory_offset: 1649267441664, tls_template: TlsTemplate { start_addr: 0, file_size: 0, mem_size: 0 }, _non_exhaustive: 0 }

One field

  • memory_map
    • How much physical memory is available
    • Which memory regions are reserved for devices such as the VGA hardware.
    • Queried from the BIOS or UEFI firmware during boot.

Two fields

  • memory_map
  • physical_memory_offset
    • Virtual start address of the physical memory mapping.
    • Configurable by adding a [package.metadata.bootloader] table in Cargo.toml and setting the field physical-memory-offset = "0x0000f00000000000"
    • Higher values are less likely to explode.

Implementation

Memory Access

  • Let’s begin to implement our memory management.
  • Easiest within src/memory.rs
src/lib.rs
pub mod memory;

It Begins

  • First, we simply access the page table (and return it).
    • Read from CR3
    • Offset
    • A billion casts.

In Rust

  • Like this:
src/memory.rs
pub unsafe fn active_level_4_table(offset: x86_64::VirtAddr)
-> &'static mut x86_64::structures::paging::PageTable
{
    unsafe {
        let (frame, _) = x86_64::registers::control::Cr3::read();
    
        let phys = frame.start_address();
        let virt = offset + phys.as_u64();
        let ptr: *mut PageTable = virt.as_mut_ptr();
    
        return &mut *ptr;
    }
}

Peep it

  • We can now traverse the table within src/main.rs
src/main.rs
let offset = x86_64::VirtAddr::new(boot_info.physical_memory_offset);
let table = unsafe { osirs::memory::active_level_4_table(offset) };
let mut i = 0;

for entry in table.iter() {
    if !entry.is_unused() {
        println!("L4 Entry {:02x}: {:?}", i, entry);
    }
    i = i + 1;
}

println!("FIN");

Look, a table

  • I’m guessing this will be the same for you.

What’s there?

  • Various non-empty entries, which map different level 3 tables.
    • Kernel code
    • Kernel stack
    • Physical memory mapping
    • Boot information

Tree Traversal

  • Presumably you know how to traverse a tree.
    • Just have a billion casts.
  • Left as an exercise but not a particularly highly recommended one.
    • More fun if you do it with recursive tables.

Translation

VirtAddr -> PhysAddr

  • Translate a virtual to a physical address.
    • Traverse the four-level page table
    • Return the mapped frame.
  • Let’s create a function that performs this translation:

Translate

src/memory.rs
pub unsafe fn translate_addr(addr: VirtAddr, offset: VirtAddr)
    -> Option<PhysAddr>
{
    // Do thing.
    return None;
}

Sketch

  • Read level 4
  • Break addresses in radices.
  • Descend 4->3->2->1
  • Return

Read level 4

  • We’ll be changing frames as we descend.
src/memory.rs
pub unsafe fn translate_addr(addr: x86_64::VirtAddr, offset: x86_64::VirtAddr)
    -> Option<x86_64::PhysAddr>
{
    let (l4, _) = x86_64::registers::control::Cr3::read();
    let mut frame = l4;
    return None;
}

Radices

  • Surprised this is idiomatic but okay.
src/memory.rs
pub unsafe fn translate_addr(addr: x86_64::VirtAddr, offset: x86_64::VirtAddr)
    -> Option<x86_64::PhysAddr>
{
    let (l4, _) = x86_64::registers::control::Cr3::read();
    let mut frame = l4;
    let indices = [addr.p4_index(), addr.p3_index(), addr.p2_index(), addr.p1_index()];
    return None;
}

Descend

  • Get the relevant table
src/memory.rs
pub unsafe fn translate_addr(addr: x86_64::VirtAddr, offset: x86_64::VirtAddr)
    -> Option<x86_64::PhysAddr>
{
    let (l4, _) = x86_64::registers::control::Cr3::read();
    let mut frame = l4;
    let indices = [addr.p4_index(), addr.p3_index(), addr.p2_index(), addr.p1_index()];
    for &index in &indices {
        let virt = offset + frame.start_address().as_u64();
        let ptr: *const x86_64::structures::paging::PageTable = virt.as_ptr();
        let table = &*ptr;
    }
    return None;
}

Continue Descend

  • Get the relevant frame
src/memory.rs
pub unsafe fn translate_addr(addr: x86_64::VirtAddr, offset: x86_64::VirtAddr)
    -> Option<x86_64::PhysAddr>
{
    let (l4, _) = x86_64::registers::control::Cr3::read();
    let mut frame = l4;
    let indices = [addr.p4_index(), addr.p3_index(), addr.p2_index(), addr.p1_index()];
    for &index in &indices {
        let virt = offset + frame.start_address().as_u64();
        let ptr: *const x86_64::structures::paging::PageTable = virt.as_ptr();
        frame = match &(&*ptr)[index].frame() {
            Ok(x) => *x,
            Err(_) => return None,
        };
    }
    return None;
}

Return

  • Compose the address from offset and start address.
src/memory.rs
pub unsafe fn translate_addr(addr: x86_64::VirtAddr, offset: x86_64::VirtAddr)
    -> Option<x86_64::PhysAddr>
{
    let (l4, _) = x86_64::registers::control::Cr3::read();
    let mut frame = l4;
    let indices = [addr.p4_index(), addr.p3_index(), addr.p2_index(), addr.p1_index()];
    for &index in &indices {
        let virt = offset + frame.start_address().as_u64();
        let ptr: *const x86_64::structures::paging::PageTable = virt.as_ptr();
        frame = match &(&*ptr)[index].frame() {
            Ok(x) => *x,
            Err(_) => return None,
        };
    }
    return Some(frame.start_address() + u64::from(addr.page_offset())); 
}

It’s unsafe btw.

src/memory.rs
pub unsafe fn translate_addr(addr: x86_64::VirtAddr, offset: x86_64::VirtAddr)
    -> Option<x86_64::PhysAddr>
{
    unsafe {
        let (l4, _) = x86_64::registers::control::Cr3::read();
        let mut frame = l4;
        let indices = [addr.p4_index(), addr.p3_index(), addr.p2_index(), addr.p1_index()];
        for &index in &indices {
            let virt = offset + frame.start_address().as_u64();
            let ptr: *const x86_64::structures::paging::PageTable = virt.as_ptr();
            frame = match &(&*ptr)[index].frame() {
                Ok(x) => *x,
                Err(_) => return None,
            };
        }
        return Some(frame.start_address() + u64::from(addr.page_offset())); 
    }
}

Test it

  • Modify main slightly to look up some addresses.
src/main.rs
let offset = x86_64::VirtAddr::new(boot_info.physical_memory_offset);
let mapper = unsafe { osirs::memory::init(offset) };
for &address in &addresses {
    let virt = x86_64::VirtAddr::new(address);
    use x86_64::structures::paging::Translate;
    let phys = mapper.translate(virt);
    println!("{:?} -> {:?}", virt, phys);
}

I got

  • Naturally have a problem with address 0.

More Cheating

Using OffsetPageTable

  • Translating virtual to physical addresses is (1) common and (2) tedious.
  • x86_64 crate provides an abstraction for it.
  • It supports huge pages (which caused our error) and several other page table functions apart from translate_addr

Traits

  • It uses (sigh) traits

Traits

Interface only

  • This is only an interface.
    • Big Object Oriented is trying to get us to orient objects without orienting any objects itself.
    • This is fine and I’m not mad.
  • x86_64 provides 3 types that implement the traits, one of which is…

Init

  • At this point easier to just use a lifetime than work around it.
  • <'static> means “lasts as long as the program is running”.
src/memory.rs
pub unsafe fn init(
    offset: x86_64::VirtAddr,
) -> x86_64::structures::paging::OffsetPageTable<'static> {
    unsafe {
        let l4 = active_level_4_table(offset);
        x86_64::structures::paging::OffsetPageTable::new(l4, offset)
    }
}

Use it

  • We can now use the Translate::translate_addr method:
src/main.rs
let offset = x86_64::VirtAddr::new(boot_info.physical_memory_offset);
let mapper = unsafe { memory::init(phys_mem_offset) };

for &address in &addresses {
    let virt = VirtAddr::new(address);
    use x86_64::structures::paging::Translate;
    let phys = mapper.translate_addr(virt);
    println!("{:?} -> {:?}", virt, phys);
}

A Small Change

  • Note that we can now map the zero address.
    • Cool. Neat, even.
    • We could’ve solved that problem ourselves but now we don’t have to.

By using the translation function of the MappedPageTable type, we can spare ourselves the work of implementing huge page support. We also have access to other page functions, such as map_to, which we will use in the next section.

Fin