NetBSD code study

TODO:

It is a good idea to explain segments (data, code, bss, and so on) in order to understand the boot programs, that are written in Assembly and make use of these keywords.

Good pages are:

Wikipedia page about Data_segment

Wikipedia page about bss

Wikipedia page about Code_segment

Booting

Boot introduction

http://www.khmere.com/freebsd_book/html/ch02.html

The flow of the boot process is:

1. Turn the computer on.

2. BIOS loads mbr.S to memory.

3. mbr.S loads pbr.S from the beginning of the NetBSD partition.

4. pbr.S loads sector 1 from disk

TODO: describe stuff in terms of bootstrap programs primary, secondary and zero (mbr.S).

Useful man pages to understand how the boot process work and how NetBSD deal with it are:

installboot(8): has explanation about primary and secondary bootstrap programs.

disklabel(8): information about disklabel structure and its homonym utility.

fdisk(8): contains explanation on how MBR works, how NetBSD deal with it and different bootstrap stages taken by NetBSD.

boot(8): overview on how bootstrapping procedure works.

From disklabel man page:

On systems that expect to have disks with MBR partitions (see fdisk(8)) disklabel will find, and update if requested, labels in the first 8k of type 169 (NetBSD) MBR labels and within the first 8k of the physical disk. On other systems disklabel will only look at the start of the disk. The offset at which the labels are written is also system dependent.

Those man pages say about boot programs of NetBSD. Similar to the FreeBSD (link above), NetBSD also divides it up in different programs:

The very first program is a really small program that must fit the MBR (remember we are talking about i386). It is /usr/mdec/mbr. It will then load and pass control to the second program, that is bootxx.

So, the second program is bootxx and is in /usr/mdec/bootxx_FSTYPE. TODO: where is it installed? The third one a much bigger program (52 kb here) /usr/mdec/boot and is copied to /. TODO: is this the program that loads the /boot.cfg file?

We will start from MBR.

MBR structure

The MBR structure, according to [MBR] consists of 512 bytes, distributed in the following structure

# of bytes Meaning
446Bootstrap code area
64Information about partitions
2Boot signature (0x55aa)

NetBSD, though, understands it a bit different, taking some space from the code area to fill with "bootsel" information. This is how NetBSD detailed understands the MBR:

# of bytes Meaning
400 Bootstrap code area
40 the bootsel (includes nametab)
4 NT Drive Serial Number
2 bootsel magic
16 Partition entry 1
16 Partition entry 2
16 Partition entry 3
16 Partition entry 4
2 Boot signature (0x55aa)

Everything on this is declared in the src/sys/sys/bootblock.h file in the mbr_sector structured:

File src/sys/sys/bootblock.h, starting at line 718:

    /*
     * MBR boot sector.
     * This is used by both the MBR (Master Boot Record) in sector 0 of the disk
     * and the PBR (Partition Boot Record) in sector 0 of an MBR partition.
     */
    struct mbr_sector {
                                            /* Jump instruction to boot code.  */
                                            /* Usually 0xE9nnnn or 0xEBnn90 */
            uint8_t                 mbr_jmpboot[3];
                                            /* OEM name and version */
            uint8_t                 mbr_oemname[8];
            union {                         /* BIOS Parameter Block */
                    struct mbr_bpbFAT12     bpb12;
                    struct mbr_bpbFAT16     bpb16;
                    struct mbr_bpbFAT32     bpb32;
            } mbr_bpb;
                                            /* Boot code */
            uint8_t                 mbr_bootcode[310];
                                            /* Config for /usr/mdec/mbr_bootsel */
            struct mbr_bootsel      mbr_bootsel;
                                            /* NT Drive Serial Number */
            uint32_t                mbr_dsn;
                                            /* mbr_bootsel magic */
            uint16_t                mbr_bootsel_magic;
                                            /* MBR partition table */
            struct mbr_partition    mbr_parts[MBR_PART_COUNT];
                                            /* MBR magic (0xaa55) */
            uint16_t                mbr_magic;
    } __packed;

You will realize that in structure, the space left to the bootstrap program is much smaller that we just predicted. This is because there are other fields declared at the top of the structure, just before the mbr_bootcode parameter. The distinction of those fields when programming the MBR is actually irrelevant and they are actually used for the bootstrap program. We realize, by looking on the comments at the top of the structure that the structure can be used for both the MBR and the PBR too. Those strange fields are just important for the PBR that we are going to see later.

struct mbr_bootsel is defined in the same file:

File src/sys/sys/bootblock.h, starting at line 693:

    struct mbr_bootsel {
            uint8_t         mbrbs_defkey;
            uint8_t         mbrbs_flags;
            uint16_t        mbrbs_timeo;
            char            mbrbs_nametab[MBR_PART_COUNT][MBR_BS_PARTNAMESIZE + 1];
    } __packed;

It has some parameters, like timeout before changing a valid partition to boot and finally the mbrbs_nametab array that allows the naming for up to four partition (MBR_PART_COUNT = 4), nine characters width (MBR_BS_PARTNAMESIZE = 8, + 1 for the null character). Those constants are defined in the same file.

Finally, each partition entry, according to [MBR], has the following structure:

Offset (bytes) Length Meaning
0 1 Status (bit 7 set, i.e., 0x40, means active or bootable. Some old MBRs work with 0x80.
1 3 CHS address of first sector in partition
4 1 Partition type
5 3 CHS address of last sector in partition
8 4 LBA address of first sector in partition
12 4 Number of sectors in partition
CHS
Cylinder-head-sector address https://en.wikipedia.org/wiki/Cylinder-head-sector
LBA
Logic block addressing https://en.wikipedia.org/wiki/Logical_block_addressing

First program: mbr.S

Introduction: This is where all the magic begins. The mbr.S program is the assembly code later assembled and stored at the first 512 bytes of the disk. MBR reads it and execute. This program finds the NetBSD partition, reads the first sector of it, where the next boot program remains, and execute it.

The first program is found at src/sys/arch/i386/stand/mbr/mbr.S. This is the code that will later generate the binary program at /usr/mdec/mbr and variants but we'll study the most basic version for now. That is, we consider that:

*Note*: For information on this very first program and different flavours of it, it is recommended to take a look at the mbr(8) man page.

mbr(8) man page

*Note*: You might want to compile the mbr program from the mbr.S source. To make this, first, compile the tools with the top-level build.sh script:

        $ ./build.sh -u -U -T /tmp/tools -O /tmp/objs tools

It will first compile a set of tools necessary to build NetBSD. Then, cd to the directory where mbr Makefile is:

        $ cd sys/arch/i386/stand/mbr/mbr
        $ /tmp/tools/bin/nbmake MACHINE_GNU_ARCH=i486 TOOLDIR=/tmp/tools

It was necessary to set MACHINE_GNU_ARCH=i486 in my case, because make was looking for binaries prefixed with i386.

This will then create the mbr.o intermediate file and the mbr final file, which is the final file that can be found in /usr/mdec/mbr.

MBR is usually not updated. If the user wants to change the MBR program, it needs to do it by hand (sysinst also does that). But there is the very useful fdisk(8) utility to update the partition table only.

Back to the mbr.S program, first lines have constants that are used alongside the program. Later, we the beginning of the program (just after ENTRY(start). ENTRY() is nothing more than a macro, declared in src/sys/arch/i386/include/asm.h.

File src/sys/arch/i386/include/asm.h, starting at line 174:

    #define ENTRY(y)        _ENTRY(_C_LABEL(y)); _PROF_PROLOGUE

File src/sys/arch/i386/include/asm.h, starting at line 96:

    #define _ENTRY(x) \
            .text; _ALIGN_TEXT; .globl x; .type x,@function; x:

So, we see ENTRY() is just a macro to insert some GNU Assembler directives, including a label to specify the program entry point (start). We are not going into details about this for now.

Let's first give some explanation on the MBR code. BIOS read the first 512 bytes of MBR into the address 0x7c00 and execute it. But the following code actually copies these 512 bytes elsewhere (address 0x8800) and jump to it. Why? That is because, later, the second phase of the boot procedure will be loaded in 0x7c00. If you would like to understand more about the MBR sector and how operating systems deal with this and the BIOS, please refer to [BLU2010].

Nice explanation on how NetBSD boot procedure works

Address 0x8800 doesn't show up anywhere in current program, but we know it because all address have the reference 0x8800 as a starting point, since this is the load address the current binary will be linked to. If you are curious on how the MBR code is compiled, take a look at the following piece of code. See that the load address is passed to the linker.

File src/sys/arch/i386/stand/mbr/Makefile.mbr, starting at line 43:

    LOADADDR=       0x8800

    AFLAGS.mbr.S= ${${ACTIVE_CC} == "clang":?-no-integrated-as:}
    AFLAGS.gpt.S= ${${ACTIVE_CC} == "clang":?-no-integrated-as:}

    ${PROG}: ${OBJS}
            ${_MKTARGET_LINK}
            ${CC} -o ${PROG}.tmp ${LDFLAGS} -Wl,-Ttext,${LOADADDR} ${OBJS}
            @ set -- $$( ${NM} -t d ${PROG}.tmp | grep '\<mbr_space\>' \
                        | ${TOOL_SED} 's/^0*//'  ); \
                    echo "#### There are $$1 free bytes in ${PROG}"
            ${OBJCOPY} -O binary ${PROG}.tmp ${PROG}
            rm -f ${PROG}.tmp

So, back to our asm program, it really starts at `ENTRY(start)` with the
following code.  What this code do is to copy everything 

File `src/sys/arch/i386/stand/mbr/mbr.S`, starting at line 127:

    ENTRY(start)
            xor     %ax, %ax
            mov     %ax, %ss
            movw    $BOOTADDR, %sp
            mov     %ax, %es
            mov     %ax, %ds
            movw    $mbr, %di
            mov     $BOOTADDR + (mbr - start), %si
            push    %ax                     /* zero for %cs of lret */
            push    %di
            movw    $(bss_start - mbr), %cx
            rep
            movsb                           /* relocate code */
            mov     $(bss_end - bss_start + 511)/512, %ch
            rep
            stosw                           /* zero bss */
            lret                            /* Ensures %cs == 0 */

Let's understand it part by part:

1. ENTRY(start): we already described it just above.

2. First line (xor) just clean %ax register.

3. Content of %ax is moved to %ss, i. e., %ss is zeroed. (%ss stands for "stack section") (The assembler syntax used here is the one from GNU Assembler, since this is the assembler used by the NetBSD Project. For this reason, the destination register is at the right side of the operation.).

Register Naming

MOV src, dest (or) MOV dest, src?

4. The next line sets the %sp (stack pointer) to the address 0x7c00. BOOTADDR is a constant defined in line 77 of the current file with this value. According to [BLU2010] (page 15), "(...) BIOS likes always to load the boot sector to the address 0x7c00 (...)" and so we have to tell our MBR program where we are :-). Address 0x7c00 to 0x7e00 (512 bytes) are then reserved for this very first program. Remember that the stack grows downwards ([BLU2010], page 17), so to an address lower than 0x7c00, hence not touching our code.

.. TODO: maybe describe other important memory regions, like the picture in .. page 14 of [BLU2010]?

5. %es is zeroed.

6. %ds is zeroed. (Modern operating systems usually point all these registers to the same place, effectiely disabling their use. That is what it is happening here (TODO: confirm))

X86 Architecture: Segment_Registers

7. Move mbr program address to %di. The mbr program is written just the current block we are analyzing.

The movw moves a 16 bit integer (can be an address), i. e., a word. There are also the movb, movl and movq opcodes. They move, Respectively, a BYTE, a DWORD and a QWORD. A BYTE represents 8 bits; a WORD, 16 bits; a DWORD, 32 bits and a QWORD, 64 bits.

What's the difference between mov and movl?

Wikipedia page about what is the context of computer architecture

8. $BOOTADDR + (mbr - start) is the address of the mbr program when loaded in the main memory. Move this address to %si. Remember: it is *not* just 0x7c00 + (mbr - 0) but, something like 0x7c00 + (mbr - 0x8800), since this code was compiled with load address at 0x8800 (as we just explained) so changing the offset of all addresses in current program.

9. Push (store) value of %ax (zero) to the stack, for later restoration.

10. Push (store) value of %di to the stack, for later restoration.

11. bss_start is defined at the end of the file and contains the address of the end of this program. Hence %cx will have the number of bytes between the mbr "program" and the end of the current program and it is used as a counter. The full command is bss_start = .. A lone dot means "the current address"

GNU Assembler documentation: The Special Dot Symbol

For more information about the "BSS section", see the following links:

Wikipedia entry about Data_segment

Wikipedia entry about .bss

Let's take a look at the end of the file.

File src/sys/arch/i386/stand/mbr/mbr.S, starting at line 658:

    bss_off = 0
    bss_start = .
    #define BSS(name, size) name = bss_start + bss_off; bss_off = bss_off + size
            BSS(ptn_list, 256 * 4)              /* long[]: boot sector numbers */
            BSS(dump_eax_buff, 16)
            BSS(bss_end, 0)

We see that "bss" is a place that start just after boot program with the following components:

ptn_list
1 KB. TODO: purpose? Maybe for the next program?
dump_eax_buff
16 bytes. TODO: purpose?

I.e., the "bss" section is a region of 1040 bytes.

12. Repeat the next command.

13. We just set %di, %si and %cx above to call this operation. We call movsb (movsb stands for "Move String Bytes".) that will copy all bytes starting in address %si to %di (incrementing both at each iteration), %cx times (Operands %es and %ds are also used, but in 32 bit mode). That is, it will copy all the program (from the mbr label just below to the end of the program) to the address just after the current block (just before the mbr label or program).

Information about %es and %ds usage and string handling in Assembly

More information about string handling in Assembly

14. We explained, in step 11, that bss_start has the address of the end of the current program. bss_end is defined at the end of the file.

We see that BSS() is a macro that is made to define variables in respect to bss_start. bss_end is, therefore, the end of this region called bss (TODO: what does it stand for? what are its purposes?). Eventually, we realize that %ch will hold the number of 512-byte blocks + 1, if there is one uncompleted block. That is, the number of 512-byte blocks rounded up. So, some examples:

Let's remember that %ch stores the 8 most significant bits of %cx, i. e., %cx = %ch * 256 + %cl.

15. Repeat the next command.

16. Zero the memory region just after the current program, i.e., zero the bss. stosw copies number in %ax (zero) to memory regions starting at %di. If used with the rep opcode (which is exactly our case) it uses %cx to know how much times it need to repeat, but remember, since stosw move *words* (16 bit), %cx will have half of the bytes to be written. E.g.: suppose we are going to write byte 0x0 100 times. %ax will be 0x0 and %cx will be just 50 if we are going to use stosw (not stosb).

This explains the code in step 14. "bss" size is 1040 bytes, (bss_end - bss_start + 511)/512 is 3. But since it is being stored in %ch and %cx = %ch * 256 + %cl, then %cx = 3 * 256 + %cl, i.e., %cx = 768 + %cl. It will be a number lower than "bss" size but if stosw copies *words, not bytes*, it will traverse the double of bytes, so zeroing the region.

TODO: why not just movw $(bss_end - bss_start)/2, %cx? Ask dsl?

17. Finally, make a long jump to the mbr address, whose value we pushed to the stack at step 10.

*What this piece of code does?*

We moved all the rest of the MBR code to address 0x8800, zeroed the "bss" section (just after current address) and made a long jump to where MBR code is. Now, let's just resume at where MBR is! At the source code, it is coded just below the code we just studied. This is exactly what firstly does a typical bootstrapping code, acording to [MBRX86].

We move ourselves out of the way because we'll load the next program, pbr.S, in address 0x7c00 Some might think: why not just execute this program and copy pbr.S elsewhere? We'll see later that pbr.S can be loaded directly in some situations, without a previous program like mbr.S, so the BIOS would load it to 0x7c00. We also need to load it there because of all address linkage it has.

Contents of registers Value
%ax 0
Contents of stack
Value Observation
mbr Address of the mbr label where the MBR program trully begans.
0

File src/sys/arch/i386/stand/mbr/mbr.S, starting at line 145:

    /*
     * Sanity check the drive number passed by the BIOS. Some BIOSs may not
     * do this and pass garbage.
     */
    mbr:
            cmpb        $MAXDRV, %dl            /* relies on MINDRV being 0x80 */
            jle 1f
            movb        $MINDRV, %dl            /* garbage in, boot disk 0 */
    1:
            push        %dx                     /* save drive number */
            push        %dx                     /* twice - for err_msg loop */

We will consider the standard mbr program, where no options are defined.

One important thing about this code is that it uses *local labels*. They just consist in numbers and jumping to them is made with suffix f (forward) or b (backward).

Information about local labels

Nice example on using local labels

We should pay attention to the %dl register. This is the drive number MBR was loaded from and according to [MBRX86] this is the only important number BIOS passes to the MBR. It is 0x0, 0x1 etc. for floppy drives and 0x80, 0x81 etc. for hard disk drives.

Example: Pintos Operating System loader in Assembly with information about drive numbers

1. If first starts making a comparison of %dl against $MAXDRV, which is 0x8f, the biggest possible value (TODO: references?). If the number is less-than or equal to 0x8f, jump to the next 1 label. If not, force value 0x80 (the first hard drive) to %dl.

2. It them pushes the value of %dx to the stack twice. Remember %dl stores the lowest bits of the %dx register.

The rest of the current piece of code is about the serial port and printing message to the user, things we are not interested in.

At the end of this piece of code, our important registers look the same, but the stack changed.

*What this piece of code does?*

It just checks the %dl register. This is where BIOS stores the drive number that we are booted from (HDD, Floppy disks, etc.). If it is a invalid value, force 0x80 (the first hard drive).

Contents of register Value
%ax 0
Contents of stack Observation
drive number 0x80 probably
drive number 0x80 probably
mbr Address of the mbr label where the MBR program trully begans.
0

Let's them take a look at a more complex code.

File src/sys/arch/i386/stand/mbr/mbr.S, starting at line 172:

    /*
     * Walk through the selector (name) table printing used entries.
     *
     * Register use:
     * %ax                  temp
     * %bx  nametab[]       boot seletor menu
     * %ecx                 base of 'extended' partition
     * %edx                 next extended partition
     * %si                  message ptr (etc)
     * %edi                 sector number of this partition
     * %bp  parttab[]       mbr partition table
     */
    bootsel_menu:
            movw    $nametab, %bx
    #ifdef BOOT_EXTENDED
            xorl    %ecx, %ecx              /* base of extended partition */
    next_extended:
            xorl    %edx, %edx              /* for next extended partition */
    #endif
            lea     parttab - nametab(%bx), %bp
    next_ptn:
            movb    4(%bp), %al             /* partition type */
    #ifdef NO_CHS
            movl    8(%bp), %edi            /* partition sector number */
    #ifdef BOOT_EXTENDED
            cmpb    $MBR_PTYPE_EXT, %al     /* Extended partition */
            je      1f
            cmpb    $MBR_PTYPE_EXT_LBA, %al /* Extended LBA partition */
            je      1f
            cmpb    $MBR_PTYPE_EXT_LNX, %al /* Linux extended partition */
            jne     2f
    1:      movl    %edi, %edx              /* save next extended ptn */
            jmp     4f
    2:
    #endif
            addl    lba_sector, %edi        /* add in extended ptn base */
    #endif
            test    %al, %al                /* undefined partition */
            je      4f
            cmpb    $0x80, (%bp)            /* check for active partition */
            jne     3f                      /* jump if not... */
    #define ACTIVE  (4 * ((KEY_ACTIVE - KEY_DISK1) & 0xff))
    #ifdef NO_CHS
            movl    %edi, ptn_list + ACTIVE /* save location of active ptn */
    #else
            mov     %bp, ptn_list + ACTIVE
    #endif
    #undef ACTIVE
    3:
    #ifdef BOOTSEL
            cmpb    $0, (%bx)               /* check for prompt */
            jz      4f
            /* output menu item */
            movw    $prefix, %si
            incb    (%si)
            call    message                 /* menu number */
            mov     (%si), %si              /* ':' << 8 | '1' + count */
            shl     $2, %si                 /* const + count * 4 */
    #define CONST   (4 * ((':' << 8) + '1' - ((KEY_PTN1 - KEY_DISK1) & 0xff)))
    #ifdef NO_CHS
            movl    %edi, ptn_list - CONST(%si)     /* sector to read */
    #else
            mov     %bp, ptn_list - CONST(%si)      /* partition info */
    #endif
    #undef CONST
            mov     %bx, %si
            call    message_crlf                    /* prompt */
    #endif
    4:
            add     $0x10, %bp
            add     $TABENTRYSIZE, %bx
            cmpb    $(nametab - start - 0x100) + 4 * TABENTRYSIZE, %bl
            jne     next_ptn

This piece of program seem very confusing at a first glance, but it is not: there are blocks we are just going to ignore, because we considered some macros are undefined or defined at the start of this section.

1. The first line moves the address of the nametab string to the %bx register. Commentary at the top of the code says nametab[] stores "boot selector menu". Let's take a look at the nametab definition.

File src/sys/arch/i386/stand/mbr/mbr.S, starting at line 636:

    nametab:
            .fill       MBR_PART_COUNT * (MBR_BS_PARTNAMESIZE + 1), 0x01, 0x00

This .fill directive will repeat the byte 0x0, which has size 0x01, *n* times (where *n* is MBR_PART_COUNT * (MBR_BS_PARTNAMESIZE + 1)). MBR_PART_COUNT and MBR_BS_PARTNAMESIZE are defined in src/sys/sys/bootblock.h. Well, we already know, by section "Boot introduction", that *nametab* refers to 8 characters width names for up to 4 partitions, i. e., the .fill directive reserves 36 bytes.

The .fill directive works like a loop, in the following way: its syntax is .fill count , size , value. I.e., repeat the byte *value* which has size *size*, *count* times.

Assembler directives

The *bootsel* part (which includes the *nametab* strings), according to mbr(8), would allow some to chose which partition to boot from, but, for the purpose of understanding the basics of this program, we assume that *bootsel* is disabled, producing a much simplier MBR program.

There is also parttab that we are going to use at the next step. It is defined just below.

File src/sys/arch/i386/stand/mbr/mbr.S, starting at line 650:

            . = start + MBR_PART_OFFSET
    parttab:
            .fill       0x40, 0x01, 0x00

We see that the null byte is repeated 64 times (hex 0x40), the exact size for the partition table just before the magic number.

Detailed contents of a partition entry

2. Next line start with the lea opcode. To understanding this line, it is enough to understand that a lea instruction is similar to a mov one but has some important and numerous differences. For this example, it is enough to know that, in form mov address, register, while mov moves the content of the memory address of the source operand to the destination, lea moves the address itself.

Explanation and more about LEA can be found here:

Why LEA was created and why it is used

It also uses a relative address mode. Addressing modes are well explained in [BARTLETT2009], page 35. Translating, the following expression:

    parttab - nametab(%bx)

Is equivalent to::

    (parttab - nametab) + %bx

(Note that the subtraction is done at compile time. So, when executing the code, it will be something like x(%bx) where x = parttab - nametab.)

But %bx is nametab from step 1. This lead us to the expression:

    parttab - nametab + nametab

Why the programmer wrote such a complicated line to store in %bp the address of parttab? This is the first line of a loop if the user is traversing other partitions. Later, %bx is changed and the program execution come back from label next_extended, just above. Since we are not interested in this, in our case the current instruction is equivalent to:

    mov $parttab, %bp

So %bp just holds the address of parttab.

Before moving forward the next step, note that, from the comment at the beginning of the current block, %bx stores the address of nametab[] -- boot selector menu and %bp stores the address of parttab[] -- mbr partition menu.

3. Like the previous instruction, this one also uses a displacement to specify address. The 4(%bp) part means "take the address in %bp, sum 4 bytes and store the content of the resulting address in %bp". Because %bp has the address of the partition table, %bp + 4 will point to the byte that stores the partition type.

4. Since NO_CHS is undefined, we just go to the test instruction. test %al, %al will make a bitwise *AND* operation on operands and set the Zero Flag if the resulting of the *AND* operation is zero, i.e., if both operands are zero, i.e., in our case, if %al is zero . The next je instruction checks the Zero Flag and jumps if it is zero. In summary, it will jump if %al is zero. Remember %al holds the partition type so the jump only makes sense if the first partition type is 0x0, i.e., *<UNUSED>*.

The point of test %eax %eax

Zero flag

5. Next comparison, cmbp $0x80, (%bp) checks if the current partition is active. Note that it is using indirect addressing in register %bp because it stores the address of the partition. To fetch the value pointed by the address, we need to surround the register with parenthesis (for GNU asm). Finally, compare if we just found a bootable partition. If not, jump to 3 label forward.

6. The mov instruction just below, on the else block (because NO_CHS is undefined) will be executed only if we already found an active partition (see line we just analyzed). The mov opcode will move the content of the address in register %bp to the memory location at ptn_list + ACTIVE. %bp holds the address of the active partition. Remember ptn_list? It is defined using the BSS macro and points to a 1 KB region just after the end of the MBR program in memory. The ACTIVE macro is defined just above:

File src/sys/arch/i386/stand/mbr/mbr.S, starting at line 213:

    #define     ACTIVE  (4 * ((KEY_ACTIVE - KEY_DISK1) & 0xff))

The definition of KEY_ACTIVE and KEY_DISK1 constants are at the beginning of the file and, for this version we are analyzing (COM_PORT undefined) they are:

File src/sys/arch/i386/stand/mbr/mbr.S, starting at line 89:

    #define     SCAN_ENTER      0x1c
    #define     SCAN_F1         0x3b
    #define     SCAN_1          0x2

    #define     KEY_ACTIVE      SCAN_ENTER
    #define     KEY_DISK1       SCAN_F1
    #define     KEY_PTN1        SCAN_1

So, the ACTIVE is calculated::

    (4 * $(0x1c - 0x3b)) & 0xff
    (4 * $(28 - 59)) & 0xff
    (4 * (-31)) & 0xff
    (-124) & 0xff
    132

Its value will be 132, so this instruction stores the location of the active partition (%bp) 132 bytes ahead of the start of ptn_list region (i.e., 132 bytes after the end of the MBR in memory).

We don't consider all the long block between #ifdef BOOTSEL and its closing #endif. Let's then go to the next valid line for our study.

7. The next line we study is the addition of the value 0x10 to the %bp register. %bp points to the MBR partition table. So, by summing 0x10 (i.e., 16), we just pointer to the next partition in MBR partition table.

8. Likewise, sum %bx to the width of the entry of the partition in the *nametab* structure, making %bx point to the next entry in *nametab*.

9. Next line makes a comparison. Remember that %bx stores the address of the *nametab* structure but, from the previous line, it is now pointing to the next entry. %bl will hold the lowest byte of it. The difficult expression that comes first is not as difficult as we imagine: nametab - start is just the absolute address of *nametab*, without any memory offset. It subtracts 0x100 in order to drop the highest byte of the subtraction. Then, it sums 4 times the TABENTRYSIZE. What all this mean? All this expression will store the final address of the *nametab* region and compare it with %bl. If it is not equal, jump back to the next_ptn label, making everything again with the next partition, until all have been analysed.

*What this piece of code does?*

It looks complicated but the only thing this code does is to traverse the partition table and look for one that is valid. In ptn_list + ACTIVE it stores the address of the active partition in MBR.

Register contents Value Observation
%ax 0
%bp ptn_list + 132 The address of the active partition in MBR
Contents of stack Observation
drive number 0x80 probably
drive number 0x80 probably
mbr Address of the mbr label where the MBR program trully begans.
0
Contents of memory Value
ptn_list + 132 Address of active partition in the partition table

We just finished this part of code. Now we already know how partitions are traversed in the MBR partition table and how to detect its filesystem type and if it is a bootable partition or not. Let's take a look at the next line:

File src/sys/arch/i386/stand/mbr/mbr.S, starting at line 271:

    #ifndef BOOTSEL
            mov     $(KEY_ACTIVE - KEY_DISK1) & 0xff, %ax
    #else

We've already investigated the content of this partitions and a similar expression.

So, that piece of code is equivalent to:

    mov     $(0x1c - 0x3b) & 0xff, %ax

So, it moves the value -31 (decimal) = 225 (decimal or 0xe1) to register %ax.

Let's now take a look at the next part of the code, just below all BOOTSEL block we are not interested.

File src/sys/arch/i386/stand/mbr/mbr.S, starting at line 344:

    /*
     * Boot requested partition.
     * Use keycode to index the table we generated when we scanned the mbr
     * while generating the menu.
     *
     * We very carfully saved the values in the correct part of the table.
     */

    boot_ptn:
            shl $2, %ax
            movw        %ax, %si
    #ifdef NO_CHS
            movl        ptn_list(%si), %ebp
            testl       %ebp, %ebp
            jnz boot_lba
    #else
            mov ptn_list(%si), %si
            test        %si, %si
            jnz boot_si
    #endif

Let's take a look at it line by line:

1. The shl instruction shift bits to the left side (like multiplying the value in the register by two), equivalent to the << operator in C. Since it is shifting the bits two positions, the value %ax holds is changed from 225 (0xe1) to 132 (0x84).

2. The second line just moves the value in %ax to register %si.

3. Later, after the else clause (we are not interested in block that exists if NO_CHS is defined) it moves the content of the memory region ptn_list(%si) to %si itself. What is in ptn_list(%si)? This the same as ptn_list(132), i.e., memory region indexed by ptn_list + 132. What do we have there? Remember the ACTIVE calculation above? We stored in ptn_list + ACTIVE (which is ptn_list + 132 the content of %bp register, which stores the address of the active partition.

4. Again we see the opcode test using the same register (%si) for both sides. This is used by to check the contents of %si, i.e., if it is zero. If it is not (desirable for our analysis), jnz below makes it jump to boot_si label.

*What this piece of code does?*

It just stores in register %si the address of the active partition.

Register Contents Observation
%ax 132
%si ptn_list + 132 Address of active partition in the partition table
Contents of the stack Observation
drive number 0x80 probably
drive number 0x80 probably
mbr Address of the mbr label where the MBR program trully begans.

Let's now check the boot_si part of the program:

File src/sys/arch/i386/stand/mbr/mbr.S, starting at line 409:

    /*
     * Active partition pointed to by si.
     * Read the first sector.
     *
     * We can either do a CHS (Cylinder Head Sector) or an LBA (Logical
     * Block Address) read.  Always doing the LBA one
     * would be nice - unfortunately not all systems support it.
     * Also some may contain a separate (eg SCSI) bios that doesn't
     * support it even when the main bios does.
     *
     * There is also the additional problem that the CHS values may be wrong
     * (eg if fdisk was run on a different system that used different BIOS
     * geometry).  We convert the CHS value to a LBA sector number using
     * the geometry from the BIOS, if the number matches we do a CHS read.
     */
    boot_si:
            movl    8(%si), %ebp            /* get sector # */

            testb   $MBR_BS_READ_LBA, flags
            jnz     boot_lba                /* fdisk forced LBA read */

            pop     %dx                     /* collect saved drive... */
            push    %dx                     /* ...number to dl */
            movb    $8, %ah
            int     $0x13                   /* chs info */

Let's analyse this code:

1. %si stores the address of the active partition. 8(%si) is %si + 8 and means "points to the active partition and look to eight bytes ahead. It is the address of the LBA address of the partition (which is four bytes). The movl instruction moves a long word, four bytes, to the %ebp 32-bit register, i.e., now the %ebp register stores the LBA number of the partition.

2. The next tesb and jnz instruction use the MBR_BS_READ_LBA instruction and compare it with data written in the flags label. Since the MBR_BS_READ_LBA flag is deprecated (according to a comment in bootblock.h and the flags label holds other information, we just skip that line.

3. Next two lines, instructions pop and push on the register %dx look a little mysterious to me at a first glance. Why is it poping and pushing again? The explanation I find is that we need to make sure %dx has the drive number (in %dl, the lowest byte) but at the same time it needs to be at the top of the stack. So we pop it from the stack to %dx and push it again to the stack. Some might realize that %dx never changed in this analysis but it may be changed in some code where BOOTSEL or COM_PORT are defined.

4. Finally, we move 8 to register %ah and call the *INT 13H*, which is the interruption responsible to make read and write operations using the CHS addressing. When %ah is 8, it reads disk parameters. %dl must point to the drive number (first one is 0x80). It return parameters in some registers, whose important (for this study) are:

More information about INT 13 interruption

%dh logical last index of heads (number of heads - 1 (because the index starts with 0))

%cx logical last index of cylinders = number of cylinders - 1 (because index starts with 0) and logical last index of sectors per track = number_of (because index starts with 1).

*What this piece of code does?*

Asks the BIOS information about the HDD, number of heads, cylinders and sectors in registers %cx and %dh. Store LBA information in register %ebp as well.

Contents of registers Value Observation
%cx information about heads cylinders and sectors (see above)
%dh logical last index of heads (number of heads - 1) (see above)
%dl probably 0x80 Number of the current drive
%si Address of the current partition in partition table
%ebp LBA number of the partition

The next block of code, which is a very near continuation of the last one, follows.

File src/sys/arch/i386/stand/mbr/mbr.S, starting at line 435:

    /*
     * Validate geometry, if the CHS sector number doesn't match the LBA one
     * we'll do an LBA read.
     * calc: (cylinder * number_of_heads + head) * number_of_sectors + sector
     * and compare against LBA sector number.
     * Take a slight 'flier' and assume we can just check 16bits (very likely
     * to be true because the number of sectors per track is 63).
     */
            movw    2(%si), %ax             /* cylinder + sector */
            push    %ax                     /* save for sector */
            shr     $6, %al
            xchgb   %al, %ah                /* 10 bit cylinder number */
            shr     $8, %dx                 /* last head */
            inc     %dx                     /* number of heads */
            mul     %dx
            mov     1(%si), %dl             /* head we want */
            add     %dx, %ax
            and     $0x3f, %cx              /* number of sectors */
            mul     %cx
            pop     %dx                     /* recover sector we want */
            and     $0x3f, %dx
            add     %dx, %ax
            dec     %ax

            cmp     %bp, %ax
            je      read_chs

1. We know that %si has the address of the current partition. So, 2(%si) looks for two bytes ahead of the beginning of its description. If we look at the structure of an entry in the partition table we see that bytes 1-4 (3 bytes) represent the CHS address of the partition. According to [MBR] these 3 bytes are organized the following way:

Byte Bits distribution
1 h7-h6-h5-h4-h3-h2-h1-h0
2 c9-c8-s5-s4-s3-s2-s1-s0
3 c7-c6-c5-c4-c3-c2-c1-c0

The first byte (h7-h0 bits) is reserved for the head index, up to 255, but that normally is a much lower number. To discover how many heads your HDD has, use the fdisk(8) command. Also, the -v flag will tell you the CHS addressing information about partitions.

TODO: fdisk(8) outputs both the NetBSD concept of the disk geometry and the BIOS concept. Why the difference?

*Note:* According to [MBR], the CHS conception does not correspond to modern drives.

The second byte has a composite distribution. Number of cylinders are high, much bigger than 255, so we need more bits than eight. The two most significant bits (c9-c8) of this byte are reserved for the cylinders index. On the opposite side, the number of sectors is low, so six bits (s5-s0) are enough.

The third byte has all its eight bits (c7-c0) reserved for the less significant part of the cylinder index.

So, everything the first line of code does is to push the cylinder and sector part of the code to %ax.

2. Then, push %ax to the stack. We will modify %ax later to find the cylinder information, but we need to store it somewhere because we'll need to find the sector later.

3. The shr instruction seems strange, but it is not. The cylinder information is stored with 10 bits: the most significant two bits of %al and the remaining byte 2 of %ah, as we saw previously. So, we shift the bits six positions to right, leaving the two most significant bits alone and zeroing the others.

Why is byte 1 in %al and not %ah? Because we fetch this information from 2(%si), i.e., memory. So endianness apply here: we are little-endian.

Wikipedia page about Endianess

4. Again: because of endianness, we need to swap the contents of %al and %ah. Now, %ax has the 10 bit cylinder number we need.

Data swap with xchg

5. Then, we simply shift bits right eight positions in %dx because %dh has the index of the head and we need it positioned right in %dx.

6. Next two lines, we just increment %dx by one to get the number of the head. Remember that %dx held the index of the last head, which is number_of_heads - 1.

7. Then, we multiply %dx by %ax. %ax has number of cylinders. The result is stored in %dx:%ax (This will store the most significant part in %dx and the less significant part in %ax). We just started to make the formula described at the commentary, i.e., we are converting CHS to LBA to see if both match. So this part is just "cylinder * number_of_heads".

More about the mul instruction

8. The next line is a bit strange. It moves the head number to %dl. Some would argue that it overwrites the result of multiplication (that is stored in %dx:%ax) and that is right. The explanation I find for this is that it is expected that the numbers are so small that %dx was zero, so we can use %dl and %dh without worries.

9. Add %dx to %ax. We now already have "cylinder * number_of_heads + head".

10. Remember %cx stores the cylinder and sector information about this system we fetch back there using *INT 13H*? You also remember %cx is 16 bit and sector information is stored only in the less significant six bits? How to extract them? This line just makes an and with 0x3f, which is decimal 63. In binary: 00111111.

11. Now, multiply %ax by %cx. We now have "(cylinder * number_of_heads + head) * number_of_sectors".

12. Let's pop the two bytes represent cylinder and sector numbers we stored at the top of this piece of code. We pop it to %dx.

13. Apply and again to have the information about sectors only.

14. And add it to %ax. In %ax, we now finally have the LBA number, which is "(cylinder * number_of_heads + head) * number_of_sectors + sector"!

15. We actually have to decrement by one the result, because LBA starts with zero.

16. Finally we compare both values. From %ebp setting in last piece of code, we know it stores the LBA of the NetBSD partition. We just need to compare 16 bits, so we compare %bp with %ax.

17. Supposing everything is all right, do a CHS read.

*What this piece of code does?*

Converts CHS numbers in LBA. If both CHS and recorded LBA match, do a CHS read.

Contents of stack| Observation drive number| 0x80 probably mbr| Address of the mbr label where the MBR program trully begans. 0

File src/sys/arch/i386/stand/mbr/mbr.S, starting at line 522:

    /*
     * Sector below CHS limit
     * Do a cylinder-head-sector read instead.
     */
    read_chs:
            pop     %dx                     /* recover drive # */
            movb    1(%si), %dh             /* head */
            movw    2(%si), %cx             /* ch=cyl, cl=sect */
            movw    $BOOTADDR, %bx          /* es:bx is buffer */
            movw    $0x201, %ax             /* command 2, 1 sector */
            jmp     do_read

This is simple: just pop the stack and let %dx store the drive number. Recover other things to the original registers, move the 0x201 value to %ax and jump to the do_read label! 0x201 is just 0x2 in %ah and 0x1 in %al.

*What this piece of code does?*

Just stores CHS values in right registers to make a CHS read later.

Let's quickly take a look at the do_read label:

File src/sys/arch/i386/stand/mbr/mbr.S, starting at line 481:

    /*
     * Save sector number (passed in %ebp) into lba parameter block,
     * read the sector and leap into it.
     */
    boot_lba:
            movl    %ebp, lba_sector        /* save sector number */
            movw    $lba_info, %si
            movb    $0x42, %ah
            pop     %dx                     /* recover drive # */
    do_read:
            push    %dx                     /* save drive */
            int     $0x13

            set_err(ERR_READ)
            jc      err_msg

Before going on the do_read label, it is important to note that, just before this code there is the boot_lba code that is much simplier than using and calculating CHS. It is worth to note that, from the last code we analyzed, if CHS and LBA don't match, LBA is preferred.

It simply push %dx again to store the drive number and call *INT 13*, which read or write disk stuff. We know that %ah = 0x2 that means "read sectors from the disk". %al is the count for sectors we want to read and we know %al = 0x01. Call *INT 13* that read the sector to %es:%bx.

More information about the "Read Sectors From Drive" operation

The %es:%bx is a notation for *segmentation*. In Real Mode memory is divided in segments of 64 kb, using two registers. In this example, the interruption used the *extra segment* register, %es and %bx. %es points to the segment and %bx to the offset inside that segment (up to 64 kb). The math is: %es * 0x10 + %bx. We know that %es is zero and %bx is BOOTADDR (0x7c00), so 0 * 0x10 + 0x7c00 = 0x7c00. The first sector of the partition will be loaded in address 0x7c00.

More information on segmentation

Finally, we check an error (take a look to the set_err macro) and jump to the error messages printing if something happened.

*What this piece of code does?*

Read the first sector of the current partition in address 0x7c00.

After that, continue the next piece of code.

File src/sys/arch/i386/stand/mbr/mbr.S, starting at line 497:

    /*
     * Check signature for valid bootcode
     */
            movb    BOOTADDR, %al           /* first byte non-zero */
            test    %al, %al
            jz      1f
            movw    BOOTADDR + MBR_MAGIC_OFFSET, %ax
    1:      cmp     $MBR_MAGIC, %ax
            set_err(ERR_NOOS)
            jnz     err_msg

We now have the first sector of the partition loaded in memory. We first check (with test and jz) if the first byte of the sector is zero. It can't be so if it is, jump to label 1. Then, we check if the last two bytes of MBR are the MBR_MAGIC (i.e., 0xaa55) and, if they are different, jump to err_msg, printing an error message to the user and halts.

The cmp works subtracting both sides of the comparison. If the subtraction is zero, it sets the Zero Flag, so jnz will branch if the Zero Flag is not zero, i.e., if MBR_MAGIC and %ax don't match.

More information on the `cmp` and `jz` instructions

jz and je have the same meaning:

*Note:* The 1: label here is tricky. When reading this piece of code I thought that this was wrong, because jz should branch to the set_err place directly, so the 1: label should be in front of set_err, right? I was wrong, the label is at the right place. Since we changed %al first, if the %al is zero, cmp will still fail and jnz will branch too. If the first byte is right, but %ax (the MBR magic number) is wrong, it will also fail. It works.

If 1: were in front of set_err and jz jumps, jnz would not jump. So making cmp is needed anyway. (Thanks jakllsch -at- netbsd.org for this explanation!)

*What this piece of code does?*

Checks the if the first byte of the loaded sector is not zero and if the MBR magic number matches. If everything is ok, proceed.

So, if everything worked, we just jump to the next and final piece of code!

File src/sys/arch/i386/stand/mbr/mbr.S, starting at line 508:

    /* We pass the sector number through to the next stage boot.
     * It doesn't have to use it (indeed no other mbr code will generate) it,
     * but it does let us have a NetBSD pbr that can identify where it was
     * read from!  This lets us use this code to select between two
     * NetBSD system on the same physical driver.
     * (If we've read the mbr of a different disk, it gets a random number
     * - but it wasn't expecting anything...)
    */
            movl    %ebp, %esi
            pop     %dx                     /* recover drive # */
            jmp     BOOTADDR

%ebp has the LBA of the NetBSD partition. Move it to %esi so the next code can use it (if it wants to). Also, pop the stack so %dx stores the current disk number. Finally, jump to the next program, loaded in address BOOTADDR, which is pbr.S :-)!

Second program: pbr.S

Before dive into some more asm code, let's take a look at two of the commentaries at the beginning of this file.

File src/sys/arch/i386/stand/bootxx/pbr.S, starting at line 32:

    /*
     * i386 partition boot code
     *
     * This code resides in sector zero of the netbsd partition, or sector
     * zero of an unpartitioned disk (eg a floppy).
     * Sector 1 is assumed to contain the netbsd disklabel.
     * Sectors 2 until the end of the track contain the next phase of bootstrap.
     * Which know how to read the interactive 'boot' program from filestore.
     * The job of this code is to read in the phase 1 bootstrap.
     *
     * Makefile supplies:
     * PRIMARY_LOAD_ADDRESS:        Address we load code to (0x1000).
     * BOOTXX_SECTORS:              Number of sectors we load (15).
     * X86_BOOT_MAGIC_1:            A random magic number.
     *
     * Although this code is executing at 0x7c00, it is linked to address 0x1000.
     * All data references MUST be fixed up using R().
     */

And also:

File src/sys/arch/i386/stand/bootxx/pbr.S, starting at line 89:

    /*
     * This code is loaded to addresss 0:7c00 by either the system BIOS
     * (for a floppy) or the mbr boot code.  Since the boot program will
     * be loaded to address 1000:0, we don't need to relocate ourselves
     * and can load the subsequent blocks (that load boot) to an address
     * of our choosing. 0:1000 is a not unreasonable choice.
     *
     * On entry the BIOS drive number is in %dl and %esi may contain the
     * sector we were loaded from (if we were loaded by NetBSD mbr code).
     * In any case we have to re-read sector zero of the disk and hunt
     * through the BIOS partition table for the NetBSD partition.
     *
     * Or, we may have been loaded by a GPT hybrid MBR, handoff state is
     * specified in T13 EDD-4 annex A.
     */

First: this code is loaded from mbr.S but it need not to be. This code is supposed to be in the beginning of a NetBSD partition, but there can exist unpartitioned devices (such as a floppy disk or even a HDD) where pbr.S is the program to be loaded by the BIOS. It is important to know that because we'll see that all the job we have done in mbr.S will be done again here in pbr.S to find the boot partition.

Since we already walked carefully on assembly instructions, explaining them from the beginning, we'll go a little faster this section.

File src/sys/arch/i386/stand/bootxx/pbr.S, starting at line 107:

    ENTRY(start)
            /*
             * The PC BIOS architecture defines a Boot Parameter Block (BPB) here.
             * The actual format varies between different MS-DOS versions, but
             * apparently some system BIOS insist on patching this area
             * (especially on LS120 drives - which I thought had an MBR...).
             * The initial jmp and nop are part of the standard and may be
             * tested for by the system BIOS.
             */
            jmp     start0
            nop
            .ascii  "NetBSD60"              /* oemname (8 bytes) */

            . = start + MBR_BPB_OFFSET      /* move to start of BPB */
                                            /* (ensures oemname doesn't overflow) */

            . = start + MBR_AFTERBPB        /* skip BPB */

It seems that BPB is not used in NetBSD, but pbr.S has to predict it. The first line jumps to start0 label, but make sure to leave some bytes for the BPB if some program wants to patch it.

BPS means "Boot Parameter Block", aka BIOS Parameter Block (no relationship with the BIOS we know). Mix code and data and was created for the DOS Operating System.

BIOS parameter block

A great explanation of what is BPB and its history

File src/sys/arch/i386/stand/bootxx/pbr.S, starting at line 124:

    start0:
            xor     %cx, %cx                /* don't trust values of ds, es or ss */
            mov     %cx, %ss
            mov     %cx, %sp
            mov     %cx, %es
    #ifndef BOOT_FROM_FAT
            cmpl    $0x54504721, %eax       /* did a GPT hybrid MBR start us? */
            je      boot_gpt
    #endif
            mov     %cx, %ds
            xor     %ax, %ax

            /* A 'reset disk system' request is traditional here... */
            push    %dx                     /* some BIOS zap %dl here :-( */
            int     $0x13                   /* ah == 0 from code above */
            pop     %dx

            /* Read from start of disk */
            incw    %cx                     /* track zero sector 1 */
            movb    %ch, %dh                /* dh = head = 0 */
            call    chs_read

First lines after start0 are just a cleaning of registers. We support BOOT_FROM_FAT is disabled, so lets ignore code within #ifndef block.

First, it pushes %dx to the stack before calling *INT 13*. *INT 13* is then called with %ax zero, i.e., reset the disk drive putting the head at the beginning of the disk. Next, it pops the value back. Then, it sets registers that are parameters for disk reading.

More information on *INT 13*, when `%ax` is zero

File src/sys/arch/i386/stand/bootxx/pbr.S, starting at line 361:

    chs_read:
            movw    $BOOTADDR, %bx                  /* es:bx is buffer */
            pusha
            movw    $0x200 + BOOTXX_SECTORS, %ax    /* command 2, xx sectors */
            jmp     do_read

File src/sys/arch/i386/stand/bootxx/pbr.S, starting at line 351:

    do_read:
            int     $0x13
            popa

            set_err(ERR_READ)
            jc      error
            ret

They are not very different from the code in mbr.S, but there are some details that change. Let's flesh this out.

First, it moves BOOTADDR to %bx, which means that the result of the read operation will be stored in address %es * 0x10 + %bx. pusha pushes the content of general purpose registers (%ax, %bx, %cx and %dx) onto the stack. See that, BOOTADDR in pbr.S is **different** from the one found in mbr.S. In mbr.S it was 0x7c00. In pbr.S it is 0x1000, from the early defines in the file.

More information on pusha

First, where is the BOOTXX_SECTORS definition? It is in Makefile.bootxx and is 15. We'll show the Makefile later when we talk about the "Compiling bootxx" section.

So, the value that will be loaded into %ax is just 0x2 for %ah and BOOTXX_SECTORS for %al.

Finally, parameters for reading the disk are just like the table bellow:

Register contents for disk reading Observation
%cx Cylinder and sector information: cylinder 0 and sector 1.
%dh Head 0.
%dl Pop from the stack before. It is delivered to us from mbr.S.
%ah 0x2 (read the disk!).
%al BOOTXX_SECTORS (read BOOTXX_SECTORS sectors - 15).
%es zero.
%bx BOOTADDR (0x1000).

*Important:* It is important to note that the very first sector of the disk (which in LBA would be 0x00000000 starts in cylinder, head, sector = 0, 0, 1. Yes, sector 1 (not zero!). This is important to not get confused about which sectors are being loaded this time. We are just loading the MBR again!

The CHS addressing scheme starting at sector 1 is due to historical reasons

Then, a jump is made to do_read and *INT 13* is triggered.

After that, we just pop registers back. We suppose everything is OK (no error exists) and return to the main code. Next piece of code follows.

File src/sys/arch/i386/stand/bootxx/pbr.S, starting at line 146:

    /* See if this is our code, if so we have already loaded the next stage */

            xorl    %ebp, %ebp              /* pass sector 0 to next stage */
            movl    (%bx), %eax             /* MBR code shouldn't even have ... */
            cmpl    R(start), %eax          /* ... a jmp at the start. */
            je      pbr_read_ok1

So, it zeroes %ebp and move the first two bytes of address held in %bx (i.e., BOOTADDR to %eax. Remember the BOOTADDR address is the buffer where we stored what we just loaded, which are the first BOOTXX_SECTORS beginning from sector 1 of the disk. So, it compares the first two bytes of the current program (just after label R(start)) with the ones loaded in memory.

This comparison exist because, in a unpartitioned disk, pbr.S can be at the start of the disk and it can be loaded directly. Since it didn't happen in our example (so the comparison will fail, because MBR doesn't have a jmp at the start, je will not branch and we resume at the next block of code.

MBR now is loaded to address BOOTADDR. So, the next block follows.

File src/sys/arch/i386/stand/bootxx/pbr.S, starting at line 153:

    /* Now scan the MBR partition table for a netbsd partition */

            xorl    %ebx, %ebx              /* for base extended ptn chain */
    scan_ptn_tbl:
            xorl    %ecx, %ecx              /* for next extended ptn */
            movw    $BOOTADDR + MBR_PART_OFFSET, %di
    1:      movb    4(%di), %al             /* mbrp_type */
            movl    8(%di), %ebp            /* mbrp_start == LBA sector */
            addl    lba_sector, %ebp        /* add base of extended partition */
    #ifdef BOOT_FROM_FAT
            cmpb    $MBR_PTYPE_FAT12, %al
            je      5f
            cmpb    $MBR_PTYPE_FAT16S, %al
            je      5f
            cmpb    $MBR_PTYPE_FAT16B, %al
            je      5f
            cmpb    $MBR_PTYPE_FAT32, %al
            je      5f
            cmpb    $MBR_PTYPE_FAT32L, %al
            je      5f
            cmpb    $MBR_PTYPE_FAT16L, %al
            je      5f
    #else
            cmpb    $MBR_PTYPE_NETBSD, %al
    #endif
            jne     10f
    5:      testl   %esi, %esi              /* looking for a specific sector? */
            je      boot
            cmpl    %ebp, %esi              /* ptn we wanted? */
            je      boot

The commentary at the top explains everything: we should find the NetBSD partition. First it zeroes the %ebx register and the %ecx register after label scan_ptn_tbl. Then, it moves the address of the beginning of the table (start of MBR + MBR_PART_OFFSET - 446) to register %di.

After that, it moves the byte regarding the partition type (at offset 4) to %al and four bytes to %ebp. The four bytes start at offset 8 in the partition table, i.e., they are the LBA address of the partition.

The line just below adds lba_sector to %ebp. What is this? This is a define from the top of the file pointing to a label called _lba_sector and we see it is just the following:

File src/sys/arch/i386/stand/bootxx/pbr.S, starting at line 403:

    /* Control block for int-13 LBA read. */
    _lba_info:
            .word       0x10                            /* control block length */
            .word       BOOTXX_SECTORS                  /* sector count */
            .word       BOOTADDR                        /* offset in segment */
            .word       0                               /* segment */
    _lba_sector:
            .quad       0                               /* sector # goes here... */

It is just zero! So it makes the addl line just useless? It seems so, but there might be a reason why this instruction is there. Maybe when producing a different bootxx program. The commentary at the top of this code give us tips. Also, looking at the commit history for the file is something that is always a good tip. We see, from the diff log, that this was added at the time GPT support was added. Maybe this is a GPT thing?

Diff of the pbr.S file, revision 1.18 to 1.19

Let's step back to the last block we were analyzing (and didn't finish) and consider the BOOT_FROM_FAT undefined, so we don't need to take a look at the code that reads NetBSD from a FAT partition. We go directly to the line just below the #else directive. It compares if %al has the number that matches the code for the NetBSD partition. If they are equal, jne will not jump, so we resume at label 5:. Note that the BOOT_FROM_FAT define is used to surround GPT code.

We saw that the testl instruction will make je jumps if the register, %esi is zero. We received %esi from mbr.S and it contains the LBA sector number of the NetBSD partition. Since this is not zero, je will not jump and resume to the next instruction. The next instruction is a cmpl comparison of %ebp and %esi. We suppose the NetBSD partition is the first one and %ebp and %esi are the same, so we jump to label boot:. If you are curious, the rest of this block that we are ignoring here is about extended partition handling.

File src/sys/arch/i386/stand/bootxx/pbr.S, starting at line 227:

    /*
     * Active partition pointed to by di.
     *
     * We can either do a CHS (Cylinder Head Sector) or an LBA (Logical
     * Block Address) read.  Always doing the LBA one
     * would be nice - unfortunately not all systems support it.
     * Also some may contain a separate (eg SCSI) BIOS that doesn't
     * support it even when the main BIOS does.
     *
     * The safest thing seems to be to find out whether the sector we
     * want is inside the CHS sector count.  If it is we use CHS, if
     * outside we use LBA.
     *
     * Actually we check that the CHS values reference the LBA sector,
     * if not we assume that the LBA sector is above the limit, or that
     * the geometry used (by fdisk) isn't correct.
     */
    boot:
            movl    %ebp, lba_sector        /* to control block */
            testl   %ebx, %ebx              /* was it an extended ptn? */
            jnz     boot_lba                /* yes - boot with LBA reads */

    /* get CHS values from BIOS */
            push    %dx                             /* save drive number */
            movb    $8, %ah
            int     $0x13                           /* chs info */

    /*
     * Validate geometry, if the CHS sector number doesn't match the LBA one
     * we'll do an LBA read.
     * calc: (cylinder * number_of_heads + head) * number_of_sectors + sector
     * and compare against LBA sector number.
     * Take a slight 'flier' and assume we can just check 16bits (very likely
     * to be true because the number of sectors per track is 63).
     */
            movw    2(%di), %ax                     /* cylinder + sector */
            push    %ax                             /* save for sector */
            shr     $6, %al
            xchgb   %al, %ah                        /* 10 bit cylinder number */
            shr     $8, %dx                         /* last head */
            inc     %dx                             /* number of heads */
            mul     %dx
            mov     1(%di), %dl                     /* head we want */
            add     %dx, %ax
            and     $0x3f, %cx                      /* number of sectors */
            mul     %cx
            pop     %dx                             /* recover sector we want */
            and     $0x3f, %dx
            add     %dx, %ax
            dec     %ax
            pop     %dx                             /* recover drive nmber */

            cmp     %bp, %ax
            je      read_chs

We have already seem this problem when seeing "First program: mbr.S". We cannot trust the disk LBA, so we fetch CHS information about the disk from the BIOS. We convert the partition CHS to LBA and compare the LBA we just calculated against LBA information partition holds. We both match, we do a CHS read, if not, it means that the partition can be beyond the CHS representation and prefer a LBA read. Let's assume they match and we perform a CHS read, so we jump to the read_chs: label.

File src/sys/arch/i386/stand/bootxx/pbr.S, starting at line 214:

    /*
     * Sector below CHS limit
     * Do a cylinder-head-sector read instead
     * I believe the BIOS should do reads that cross track boundaries.
     * (but the read should start at the beginning of a track...)
     */
    read_chs:
            movb    1(%di), %dh                     /* head */
            movw    2(%di), %cx                     /* ch=cyl, cl=sect */
            call    chs_read
    pbr_read_ok1:
            jmp     pbr_read_ok

So, as we saw before, the following registers are set: %dh holds the head number, %cx both cylinder and sector information. The parameters set are of the NetBSD partition, remember? Then we call chs_read that we have already seen. It reads 15 sectors from the address we just set, i.e., it reads the first 15 sectors of the NetBSD partition to BOOTADDR. Finally, we jump to pbr_read_ok:.

File src/sys/arch/i386/stand/bootxx/pbr.S, starting at line 333:

    /*
     * Check magic number for valid stage 2 bootcode
     * then jump into it.
     */
    pbr_read_ok:
            cmpl    $X86_BOOT_MAGIC_1, bootxx_magic
            set_err(ERR_NO_BOOTXX)
            jnz     error

            movl    %ebp, %esi                      /* %esi ptn base, %dl disk id */
            movl    lba_sector + 4, %edi            /* %edi ptn base high */
            jmp     $0, $bootxx                     /* our %cs may not be zero */

The commentary is self-explanatory. Every boot stage has a magic number to be checked, so we know if it is safe to run it. X86_BOOT_MAGIC_1 is defined in bootblock.h. bootxx_magic is a label in bootxx.S file (that we'll take a look at the next section). It is that bootxx.S, pbr.S, label.S and boot1.c are linked together, so the bootxx_magic label is resolved at compile time.

TODO: ask jakllsch about line 343 (lba_sector + 4, %edi): this was added for GPT support ([http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/i386/stand/bootxx/pbr.S.diff?r1=1.18&r2=1.19&only_with_tag=MAIN&f=h]). Shouldn't it be guarded by some guard (header) or at least have a comment?

We suppose we have no error. Let's just move %ebp (that stores the LBA address of the partition) to %esi. %dl holds the drive number (from the last pop opcode). Finally, we jump bootxx: label, which is in declared in the bootxx.S file.

Third program: bootxx.S

Before continuing, let's first understand what these programs in inside the bootxx directory mean. They are different programs, but they are located inside the same directory, why? We already told that they are all linked together in a bigger program. We told that bootxx.S, pbr.S, label.S and boot1.c are linked together, but not only them. The contents of the src/sys/arch/i386/stand/lib directory too and it has a lot of important files we'll later take a look.

Let's take a look at an important part of src/sys/arch/i386/stand/bootxx/Makefile.bootxx (that is included by each Makefile in bootxx_* directory).

File src/sys/arch/i386/stand/bootxx/Makefile.bootxx, starting at line 94:

    I386_STAND_DIR?= $S/arch/i386/stand

    ### find out what to use for libi386
    I386DIR= ${I386_STAND_DIR}/lib
    .include "${I386DIR}/Makefile.inc"
    LIBI386= ${I386LIB}

    ### find out what to use for libsa
    SA_AS= library
    SAMISCMAKEFLAGS+="SA_USE_LOADFILE=yes"
    .include "${S}/lib/libsa/Makefile.inc"
    LIBSA= ${SALIB}

    ### find out what to use for libkern
    KERN_AS=        library
    .include "${S}/lib/libkern/Makefile.inc"
    LIBKERN=        ${KERNLIB}

    LDSCRIPT ?= $S/arch/i386/conf/stand.ldscript

    cleandir distclean: .WAIT cleanlibdir

    cleanlibdir:
            -rm -rf lib

    LIBLIST= ${LIBI386} ${LIBSA} ${LIBKERN} ${LIBI386} ${LIBSA}

    CLEANFILES+= ${PROG}.sym ${PROG}.map

    ${PROG}: ${OBJS} ${LIBLIST} ${LDSCRIPT}
            ${_MKTARGET_LINK}
            ${CC} -o ${PROG}.sym ${LDFLAGS} -Wl,-Ttext,${PRIMARY_LOAD_ADDRESS} \
                    -T ${LDSCRIPT} -Wl,-Map,${PROG}.map -Wl,-cref ${OBJS} ${LIBLIST}
            ${OBJCOPY} -O binary ${PROG}.sym ${PROG}

By seeing the ${CC} command, we see that the build framework made on the top of make (at the bottom of the file you'll see .include <bsd.prog.mk>) already compiled source files to object files (This compiles all *.S and *.c files in *.o. This line link them together in a single executable, ${PROG}.sym. But see the last variable of this command, ${LIBLIST}. It also link this together!

More information about the build framework

${LIBLIST} is defined just above and ${LIBI386} is ${I386LIB}, which is then defined in ${I386DIR}/Makefile.inc. ${I386DIR} is just the the directory ../lib relative to the bootxx directory.

So, it includes ../lib/Makefile.inc. Let's take a look on an important part of it.

File src/sys/arch/i386/stand/lib/Makefile.inc, starting at line 16:

    # Default values:
    I386DST?=               ${.OBJDIR}/lib/i386

    #I386DIR=               $S/arch/i386/stand/lib
    I386LIB=                ${I386DST}/libi386.a

    CWARNFLAGS.clang+=      -Wno-tautological-compare

    I386MAKE= \
            cd ${I386DIR} && MAKEOBJDIRPREFIX= && unset MAKEOBJDIRPREFIX && \
                MAKEOBJDIR=${I386DST} ${MAKE} \
                CC=${CC:Q} CFLAGS=${CFLAGS:Q} \
                AS=${AS:Q} AFLAGS=${AFLAGS:Q} \
                LD=${LD:Q} STRIP=${STRIP:Q} \
                MACHINE=${MACHINE} MACHINE_ARCH=${MACHINE_ARCH:Q} \
                I386CPPFLAGS=${CPPFLAGS:S@^-I.@-I../../.@g:Q} \
                I386MISCCPPFLAGS=${I386MISCCPPFLAGS:Q} \
                ${I386MISCMAKEFLAGS}

    ${I386LIB}:             .NOTMAIN __always_make_i386lib
            @echo making sure the i386 library is up to date...
            @${I386MAKE} libi386.a
            @echo done

It just enters the ${I386DIR} and calls ${MAKE}. I.e., the main Makefile is read. We finally see a part of it.

File src/sys/arch/i386/stand/lib/Makefile, starting at line 22:

    SRCS= pcio.c conio.S comio.S comio_direct.c biosvideomode.S
    SRCS+= getsecs.c biosgetrtc.S biosdelay.S biosreboot.S gatea20.c
    SRCS+= biosmem.S getextmemx.c biosmemx.S printmemlist.c
    SRCS+= pread.c menuutils.c parseutils.c
    SRCS+= bootinfo.c bootinfo_biosgeom.c bootinfo_memmap.c
    SRCS+= startprog.S multiboot.S
    SRCS+= biosgetsystime.S cpufunc.S bootmenu.c
    SRCS+= realprot.S message.S message32.S dump_eax.S pvcopy.S putstr.S putstr32.S
    SRCS+= rasops.c vbe.c biosvbe.S
    .if (${I386_INCLUDE_DISK} == "yes")
    SRCS+= biosdisk.c biosdisk_ll.c bios_disk.S
    .endif
    .if (${I386_INCLUDE_DOS} == "yes")
    SRCS+= dosfile.c dos_file.S
    .endif
    .if (${I386_INCLUDE_DISK} == "yes") || (${I386_INCLUDE_DOS} == "yes")
    SRCS+= diskbuf.c
    .endif
    .if (${I386_INCLUDE_BUS} == "yes")
    SRCS+= biospci.c bios_pci.S isapnp.c isadma.c
    .endif
    .if (${I386_INCLUDE_PS2} == "yes")
    SRCS+= biosmca.S biosmemps2.S
    .endif

    .include <bsd.own.mk>
    .undef DESTDIR
    .include <bsd.lib.mk>

    lib${LIB}.o:: ${OBJS:O}
            @echo building standard ${LIB} library
            @rm -f lib${LIB}.o
            @${LD} -r -o lib${LIB}.o `lorder ${OBJS} | tsort`
            @echo done

It adds all .S and .c files to our library, that is finally built at the lib${LIB}.o target. Its transformation to libi386.a is done by the build framework again (that time with .include <bsd.lib.mk>.

For GDT possible configurations, see [INTEL2015], Vol 3A, p. 3-2.

src/sys/arch/i386/include/segments.h has GDT stuff.

GDT segments are later configured in kernl (see the machdep.c) file.

See:

https://www.freebsd.org/doc/en_US.ISO8859-1/books/arch-handbook/book.html

Fourth program: boot1.c

Compiling bootxx

http://nxr.netbsd.org/xref/src/sys/arch/i386/stand/boot/

TODO: read the following link:

http://www.freenix.no/arkiv/daemonnews/200009/sb.html

Interesting mail-list entries about booting in i386

https://mail-index.netbsd.org/port-i386/2007/01/23/0008.html

https://mail-index.netbsd.org/port-i386/2004/10/15/0005.html

Filesystems Overview

The NetBSD whole partition

The NetBSD partition is the letter "c" in i386, amd64 and some other architectures. I.e., if your drive is /dev/wd0, the wd0d refers to the whole disk, while wd0c refers to the NetBSD partition, only, that can exist along with other operating systems partitions. In some other architectures, though, this is not possible, so wd0c refers to both the NetBSD partition and the whole disk.

We see, from the "Booting" section that the very first 512 bytes of the disk are reserved for the MBR. The first boot program (mbr.S) finds an active partition and load the first sector of it. The first sector (4096 bytes) of the active partition is then reserved for the boot process.

The NetBSD partition:

Offset Size Description
0 4 kb Boot block for the partition, loaded from the MBR

References

[BARTLETT2009]

[BLU2010]

[INTEL2015]

[MBRX86]

[MBR]