A tour of the Mini-OS kernel

Satya Popuri
Graduate Student
University of Illinois at Chicago
Chicago, IL 60607
spopur2 [at] uic [dot] edu


Mini-OS is a small OS kernel distributed with the Xen hypervisor sources. I have documented some of the basic parts of this kernel for the reference of people trying to port their OSes to Xen (also for people writing new OSes for Xen). This work is not completed yet. The present document includes a discussion of initialization and page table setup. Watch this space for more on event channels, grant tables, Xen bus etc.

Mini-OS initialization

Mini-OS boots at the symbol _start in arch/x86/x86_32.S (see arch/x86/minios-x86_32.lds linker script). It begins by loading SS (stack segment) and ESP (stack pointer) registers with the address stored at stack_start. KERNEL_SS is a default segment descriptor provided in the GDT by xen. Since the stack grows downwards in x86 processors, ESP points to an address stack+8192 bytes. Read documentation on LSS instruction to find how this works out.

The variable stack" is allocated in arch/x86/setup.c as a global array as follows:

 * Just allocate the kernel stack here. SS:ESP is set up to point here
 * in head.S.
char stack [8192];  /* allocated in kernel bss */

The ESI register is then pushed on the stack and start_kernel() is called. This is main function that sets the ball rolling. It is evident that the ESI register must be pointing to a start_info_t structure made available to the kernel by the domain creator (xm) since start_kernel() accepts it as a parameter. The definition of this structure is in $XEN_SRC/xen/include/public/xen.h

Start of day memory layout (notes from xen.h)

 * Start-of-day memory layout:
 *  1. The domain is started within contiguous virtual-memory region.
 *  2. The contiguous region ends on an aligned 4MB boundary (in Mini-OS it ends at 4MB).
 *  3. This the order of bootstrap elements in the initial virtual region:
 *      a. relocated kernel image
 *      b. initial ram disk              [mod_start, mod_len]
 *      c. list of allocated page frames [mfn_list, nr_pages]
 *      d. start_info_t structure        [register ESI (x86)]
 *      e. bootstrap page tables         [pt_base, CR3 (x86)]
 *      f. bootstrap stack               [register ESP (x86)]
 *  4. Bootstrap elements are packed together, but each is 4kB-aligned.
 *  5. The initial ram disk may be omitted.
 *  6. The list of page frames forms a contiguous 'pseudo-physical' memory
 *     layout for the domain. In particular, the bootstrap virtual-memory
 *     region is a 1:1 mapping to the first section of the pseudo-physical map.
 *  7. All bootstrap elements are mapped read-writable for the guest OS. The
 *     only exception is the bootstrap page table, which is mapped read-only.
 *  8. There is guaranteed to be at least 512kB padding after the final
 *     bootstrap element. If necessary, the bootstrap virtual region is
 *     extended by an extra 4MB to ensure this.
There are 3 different types of address spaces: The start_info structure provides initial boot time information for the domU kernel:
struct start_info {
    char magic[32];             /* "xen--".            */
    unsigned long nr_pages;     /* Total pages allocated to this domain.  */
    unsigned long shared_info;  /* MACHINE address of shared info struct. */
    uint32_t flags;             /* SIF_xxx flags.                         */
    xen_pfn_t store_mfn;        /* MACHINE page number of shared page.    */
    uint32_t store_evtchn;      /* Event channel for store communication. */
    union {
        struct {
            xen_pfn_t mfn;      /* MACHINE page number of console page.   */
            uint32_t  evtchn;   /* Event channel for console page.        */
        } domU;
        struct {
            uint32_t info_off;  /* Offset of console_info struct.         */
            uint32_t info_size; /* Size of console_info struct from start.*/
        } dom0;
    } console;
    unsigned long pt_base;      /* VIRTUAL address of page directory.     */
    unsigned long nr_pt_frames; /* Number of bootstrap p.t. frames.       */
    unsigned long mfn_list;     /* VIRTUAL address of page-frame list.    */
    unsigned long mod_start;    /* VIRTUAL address of pre-loaded module.  */
    unsigned long mod_len;      /* Size (bytes) of pre-loaded module.     */
    int8_t cmd_line[MAX_GUEST_CMDLINE];
The information in this structure is filled in by the domain loader (xm) when creating a new domain. The shared_info structure is shared between a DomU kernel and a Dom0 kernel. This structure provides one means of communication between the two:
 * Xen/kernel shared data -- pointer provided in start_info.
 * This structure is defined to be both smaller than a page, and the
 * only data on the shared page, but may vary in actual size even within
 * compatible Xen versions; guests should not rely on the size
 * of this structure remaining constant.
struct shared_info {
    struct vcpu_info vcpu_info[MAX_VIRT_CPUS];
    unsigned long evtchn_pending[sizeof(unsigned long) * 8];
    unsigned long evtchn_mask[sizeof(unsigned long) * 8];
     * Wallclock time: updated only by control software. Guests should base
     * their gettimeofday() syscall on this wallclock-base value.
    uint32_t wc_version;      /* Version counter: see vcpu_time_info_t. */
    uint32_t wc_sec;          /* Secs  00:00:00 UTC, Jan 1, 1970.  */
    uint32_t wc_nsec;         /* Nsecs 00:00:00 UTC, Jan 1, 1970.  */
    struct arch_shared_info arch;

The start_kernel() function

This is where Mini-OS sets the ball rolling. It calls a bunch of initialization functions and then sets up the three kernel threads to run. The non-preemptive scheduler provided with Mini-OS will then schedule these threads one after another. The functions called by start_kernel() are now documented (_rougly_) below.

arch_init() [arch/x86/setup.c]

trap_init() [arch/x86/traps.c]

This function registers a trap handler table with xen using the set_trap_table() hypercall.
void trap_init(void)
The trap table is defined as follows:
 * Submit a virtual IDT to the hypervisor. This consists of tuples
 * (interrupt vector, privilege ring, CS:EIP of handler).
 * The 'privilege ring' field specifies the least-privileged ring that
 * can trap to that vector using a software-interrupt instruction (INT).
static trap_info_t trap_table[] = {
    {  0, 0, __KERNEL_CS, (unsigned long)divide_error                },
    {  1, 0, __KERNEL_CS, (unsigned long)debug                       },
    {  3, 3, __KERNEL_CS, (unsigned long)int3                        },
    {  4, 3, __KERNEL_CS, (unsigned long)overflow                    },
    {  5, 3, __KERNEL_CS, (unsigned long)bounds                      },
    {  6, 0, __KERNEL_CS, (unsigned long)invalid_op                  },
    {  7, 0, __KERNEL_CS, (unsigned long)device_not_available        },
    {  9, 0, __KERNEL_CS, (unsigned long)coprocessor_segment_overrun },
    { 10, 0, __KERNEL_CS, (unsigned long)invalid_TSS                 },
    { 11, 0, __KERNEL_CS, (unsigned long)segment_not_present         },
    { 12, 0, __KERNEL_CS, (unsigned long)stack_segment               },
    { 13, 0, __KERNEL_CS, (unsigned long)general_protection          },
    { 14, 0, __KERNEL_CS, (unsigned long)page_fault                  },
    { 15, 0, __KERNEL_CS, (unsigned long)spurious_interrupt_bug      },
    { 16, 0, __KERNEL_CS, (unsigned long)coprocessor_error           },
    { 17, 0, __KERNEL_CS, (unsigned long)alignment_check             },
    { 19, 0, __KERNEL_CS, (unsigned long)simd_coprocessor_error      },
    {  0, 0,           0, 0                           }
These handler entry points (code for these handlers) are mostly defined in arch/x86/x86_32.h

init_mm() [mm.c]

This function initializes memory management in mini-os. Provisional start of day page tables are already setup by the domain loader. The mini-os kernel's task now is to map the rest of the memory.

init_mm() first calls arch_init_mm(&start_pfn, &max_pfn) - both args being return values from arch_init_mm. This function does a number of important things.

Watch this space!

More documentation coming on Xen grant tables, event channels, process scheduling etc.

Thanks to Google code prettify project for their syntax highlighting script.