NetBSD code study
TODO:
It is a good idea to explain segments (data, code, bss, and so on) in order to understand the boot programs, that are written in Assembly and make use of these keywords.
Good pages are:
Wikipedia page about Data_segment
Wikipedia page about Code_segment
Booting
Boot introduction
http://www.khmere.com/freebsd_book/html/ch02.html
The flow of the boot process is:
1. Turn the computer on.
2. BIOS loads mbr.S
to memory.
3. mbr.S
loads pbr.S
from the beginning of the NetBSD partition.
4. pbr.S
loads sector 1 from disk
TODO: describe stuff in terms of bootstrap programs primary, secondary and zero (mbr.S).
Useful man pages to understand how the boot process work and how NetBSD deal with it are:
installboot(8): has explanation about primary and secondary bootstrap programs.
disklabel(8): information about disklabel structure and its homonym utility.
boot(8): overview on how bootstrapping procedure works.
From disklabel man page:
On systems that expect to have disks with MBR partitions (see fdisk(8)) disklabel will find, and update if requested, labels in the first 8k of type 169 (NetBSD) MBR labels and within the first 8k of the physical disk. On other systems disklabel will only look at the start of the disk. The offset at which the labels are written is also system dependent.
Those man pages say about boot programs of NetBSD. Similar to the FreeBSD (link above), NetBSD also divides it up in different programs:
The very first program is a really small program that must fit the MBR
(remember we are talking about i386). It is /usr/mdec/mbr
. It will then
load and pass control to the second program, that is bootxx
.
So, the second program is bootxx
and is in /usr/mdec/bootxx_FSTYPE
.
TODO: where is it installed? The third one a much bigger program (52 kb here)
/usr/mdec/boot
and is copied to /
. TODO: is this the program that
loads the /boot.cfg
file?
We will start from MBR.
MBR structure
The MBR structure, according to [MBR] consists of 512 bytes, distributed in the following structure
# of bytes | Meaning |
---|---|
446 | Bootstrap code area |
64 | Information about partitions |
2 | Boot signature (0x55aa) |
NetBSD, though, understands it a bit different, taking some space from the code area to fill with "bootsel" information. This is how NetBSD detailed understands the MBR:
# of bytes | Meaning |
---|---|
400 | Bootstrap code area |
40 | the bootsel (includes nametab) |
4 | NT Drive Serial Number |
2 | bootsel magic |
16 | Partition entry 1 |
16 | Partition entry 2 |
16 | Partition entry 3 |
16 | Partition entry 4 |
2 | Boot signature (0x55aa) |
Everything on this is declared in the src/sys/sys/bootblock.h
file in the
mbr_sector
structured:
File src/sys/sys/bootblock.h
, starting at line 718:
/*
* MBR boot sector.
* This is used by both the MBR (Master Boot Record) in sector 0 of the disk
* and the PBR (Partition Boot Record) in sector 0 of an MBR partition.
*/
struct mbr_sector {
/* Jump instruction to boot code. */
/* Usually 0xE9nnnn or 0xEBnn90 */
uint8_t mbr_jmpboot[3];
/* OEM name and version */
uint8_t mbr_oemname[8];
union { /* BIOS Parameter Block */
struct mbr_bpbFAT12 bpb12;
struct mbr_bpbFAT16 bpb16;
struct mbr_bpbFAT32 bpb32;
} mbr_bpb;
/* Boot code */
uint8_t mbr_bootcode[310];
/* Config for /usr/mdec/mbr_bootsel */
struct mbr_bootsel mbr_bootsel;
/* NT Drive Serial Number */
uint32_t mbr_dsn;
/* mbr_bootsel magic */
uint16_t mbr_bootsel_magic;
/* MBR partition table */
struct mbr_partition mbr_parts[MBR_PART_COUNT];
/* MBR magic (0xaa55) */
uint16_t mbr_magic;
} __packed;
You will realize that in structure, the space left to the bootstrap program is
much smaller that we just predicted. This is because there are other fields
declared at the top of the structure, just before the mbr_bootcode
parameter. The distinction of those fields when programming the MBR is
actually irrelevant and they are actually used for the bootstrap program. We
realize, by looking on the comments at the top of the structure that the
structure can be used for both the MBR and the PBR too. Those strange fields
are just important for the PBR that we are going to see later.
struct mbr_bootsel
is defined in the same file:
File src/sys/sys/bootblock.h
, starting at line 693:
struct mbr_bootsel {
uint8_t mbrbs_defkey;
uint8_t mbrbs_flags;
uint16_t mbrbs_timeo;
char mbrbs_nametab[MBR_PART_COUNT][MBR_BS_PARTNAMESIZE + 1];
} __packed;
It has some parameters, like timeout before changing a valid partition to boot
and finally the mbrbs_nametab
array that allows the naming for up to four
partition (MBR_PART_COUNT
= 4), nine characters width
(MBR_BS_PARTNAMESIZE
= 8, + 1 for the null character). Those constants
are defined in the same file.
Finally, each partition entry, according to [MBR], has the following structure:
Offset (bytes) | Length | Meaning |
---|---|---|
0 | 1 | Status (bit 7 set, i.e., 0x40, means active or bootable. Some old MBRs work with 0x80. |
1 | 3 | CHS address of first sector in partition |
4 | 1 | Partition type |
5 | 3 | CHS address of last sector in partition |
8 | 4 | LBA address of first sector in partition |
12 | 4 | Number of sectors in partition |
- CHS
- Cylinder-head-sector address https://en.wikipedia.org/wiki/Cylinder-head-sector
- LBA
- Logic block addressing https://en.wikipedia.org/wiki/Logical_block_addressing
First program: mbr.S
Introduction: This is where all the magic begins. The mbr.S
program is the assembly
code later assembled and stored at the first 512 bytes of the disk. MBR
reads it and execute. This program finds the NetBSD partition, reads the
first sector of it, where the next boot program remains, and execute it.
The first program is found at src/sys/arch/i386/stand/mbr/mbr.S
. This is
the code that will later generate the binary program at /usr/mdec/mbr
and
variants but we'll study the most basic version for now. That is, we consider
that:
BOOTSEL
is undefined: we are not interested in studying the code that let's the user change which drive it wants to load from. Imply first hard drive.BOOT_EXTENDED
is undefined: we are not interested in studying code that allows booting from extended partitions.COM_PORT
is undefined: we are not interested in studying code that would allow us booting from a serial line.NO_BANNER
is *defined*: we are not interested in studying code that shows messages to the user.NO_CHS
is undefined: so all reads are in CHS mode. Although it makes things more difficult, default/usr/mdec/mbr
doesn't have that (keeping backwards compatibility, I believe).
*Note*: For information on this very first program and different flavours of it, it is recommended to take a look at the mbr(8) man page.
*Note*: You might want to compile the mbr program from the mbr.S source. To make this, first, compile the tools with the top-level build.sh script:
$ ./build.sh -u -U -T /tmp/tools -O /tmp/objs tools
It will first compile a set of tools necessary to build NetBSD. Then, cd
to the directory where mbr Makefile
is:
$ cd sys/arch/i386/stand/mbr/mbr
$ /tmp/tools/bin/nbmake MACHINE_GNU_ARCH=i486 TOOLDIR=/tmp/tools
It was necessary to set MACHINE_GNU_ARCH=i486
in my case, because make
was looking for binaries prefixed with i386
.
This will then create the mbr.o
intermediate file and the mbr
final file, which is the final file that can be found in
/usr/mdec/mbr
.
MBR is usually not updated. If the user wants to change the MBR program, it needs to do it by hand (sysinst also does that). But there is the very useful fdisk(8) utility to update the partition table only.
Back to the mbr.S
program, first lines have constants that are used
alongside the program. Later, we the beginning of the program (just after
ENTRY(start)
. ENTRY()
is nothing more than a macro, declared in
src/sys/arch/i386/include/asm.h
.
File src/sys/arch/i386/include/asm.h
, starting at line 174:
#define ENTRY(y) _ENTRY(_C_LABEL(y)); _PROF_PROLOGUE
File src/sys/arch/i386/include/asm.h
, starting at line 96:
#define _ENTRY(x) \
.text; _ALIGN_TEXT; .globl x; .type x,@function; x:
So, we see ENTRY()
is just a macro to insert some GNU Assembler
directives, including a label to specify the program entry point (start
).
We are not going into details about this for now.
Let's first give some explanation on the MBR code. BIOS read the first 512
bytes of MBR into the address 0x7c00
and execute it. But the following
code actually copies these 512 bytes elsewhere (address 0x8800
) and jump
to it. Why? That is because, later, the second phase of the boot procedure
will be loaded in 0x7c00
. If you would like to understand more about
the MBR sector and how operating systems deal with this and the BIOS, please
refer to [BLU2010].
Nice explanation on how NetBSD boot procedure works
Address 0x8800
doesn't show up anywhere in current program, but we know it
because all address have the reference 0x8800
as a starting point, since
this is the load address the current binary will be linked to. If you are
curious on how the MBR code is compiled, take a look at the following piece of
code. See that the load address is passed to the linker.
File src/sys/arch/i386/stand/mbr/Makefile.mbr
, starting at line 43:
LOADADDR= 0x8800
AFLAGS.mbr.S= ${${ACTIVE_CC} == "clang":?-no-integrated-as:}
AFLAGS.gpt.S= ${${ACTIVE_CC} == "clang":?-no-integrated-as:}
${PROG}: ${OBJS}
${_MKTARGET_LINK}
${CC} -o ${PROG}.tmp ${LDFLAGS} -Wl,-Ttext,${LOADADDR} ${OBJS}
@ set -- $$( ${NM} -t d ${PROG}.tmp | grep '\<mbr_space\>' \
| ${TOOL_SED} 's/^0*//' ); \
echo "#### There are $$1 free bytes in ${PROG}"
${OBJCOPY} -O binary ${PROG}.tmp ${PROG}
rm -f ${PROG}.tmp
So, back to our asm program, it really starts at `ENTRY(start)` with the
following code. What this code do is to copy everything
File `src/sys/arch/i386/stand/mbr/mbr.S`, starting at line 127:
ENTRY(start)
xor %ax, %ax
mov %ax, %ss
movw $BOOTADDR, %sp
mov %ax, %es
mov %ax, %ds
movw $mbr, %di
mov $BOOTADDR + (mbr - start), %si
push %ax /* zero for %cs of lret */
push %di
movw $(bss_start - mbr), %cx
rep
movsb /* relocate code */
mov $(bss_end - bss_start + 511)/512, %ch
rep
stosw /* zero bss */
lret /* Ensures %cs == 0 */
Let's understand it part by part:
1. ENTRY(start)
: we already described it just above.
2. First line (xor
) just clean %ax
register.
3. Content of %ax
is moved to %ss
, i. e., %ss
is zeroed. (%ss
stands for
"stack section") (The assembler syntax used here is the one from GNU Assembler,
since this is the assembler used by the NetBSD Project. For this reason, the
destination register is at the right side of the operation.).
MOV src, dest (or) MOV dest, src?
4. The next line sets the %sp
(stack pointer) to the address 0x7c00
.
BOOTADDR
is a constant defined in line 77 of the current file with this
value. According to [BLU2010] (page 15), "(...) BIOS likes always to load
the boot sector to the address 0x7c00 (...)" and so we have to tell our MBR
program where we are :-). Address 0x7c00
to 0x7e00
(512 bytes) are
then reserved for this very first program. Remember that the stack grows
downwards ([BLU2010], page 17), so to an address lower than 0x7c00
, hence
not touching our code.
.. TODO: maybe describe other important memory regions, like the picture in .. page 14 of [BLU2010]?
5. %es
is zeroed.
6. %ds
is zeroed. (Modern operating systems usually point all these registers to the same place, effectiely disabling their use. That is what it is happening here (TODO: confirm))
X86 Architecture: Segment_Registers
7. Move mbr
program address to %di
. The mbr
program
is written just the current block we are analyzing.
The movw
moves a 16 bit integer (can be an address), i. e., a word. There are
also the movb, movl and movq opcodes. They move, Respectively, a BYTE, a DWORD
and a QWORD. A BYTE represents 8 bits; a WORD, 16 bits; a DWORD, 32 bits and a
QWORD, 64 bits.
What's the difference between mov and movl?
Wikipedia page about what is the context of computer architecture
8. $BOOTADDR + (mbr - start)
is the address of the mbr
program when
loaded in the main memory. Move this address to %si
. Remember: it is
*not* just 0x7c00 + (mbr - 0)
but, something like 0x7c00 + (mbr -
0x8800)
, since this code was compiled with load address at 0x8800
(as we
just explained) so changing the offset of all addresses in current program.
9. Push (store) value of %ax
(zero) to the stack, for later restoration.
10. Push (store) value of %di
to the stack, for later restoration.
11. bss_start
is defined at the end of the file and contains the address
of the end of this program. Hence %cx
will have the number of bytes
between the mbr
"program" and the end of the current program and it is
used as a counter. The full command is bss_start = .
. A lone dot means "the
current address"
GNU Assembler documentation: The Special Dot Symbol
For more information about the "BSS section", see the following links:
Wikipedia entry about Data_segment
Let's take a look at the end of the file.
File src/sys/arch/i386/stand/mbr/mbr.S
, starting at line 658:
bss_off = 0
bss_start = .
#define BSS(name, size) name = bss_start + bss_off; bss_off = bss_off + size
BSS(ptn_list, 256 * 4) /* long[]: boot sector numbers */
BSS(dump_eax_buff, 16)
BSS(bss_end, 0)
We see that "bss" is a place that start just after boot program with the following components:
ptn_list
- 1 KB. TODO: purpose? Maybe for the next program?
dump_eax_buff
- 16 bytes. TODO: purpose?
I.e., the "bss" section is a region of 1040 bytes.
12. Repeat the next command.
13. We just set %di
, %si
and %cx
above to call this operation.
We call movsb
(movsb
stands for "Move String Bytes".) that will copy all bytes starting in address %si
to %di
(incrementing both at each iteration), %cx
times (Operands %es
and %ds
are also used, but in 32 bit mode).
That is, it will copy all the program (from the mbr
label just below to
the end of the program) to the address just after the current block (just
before the mbr
label or program).
Information about %es and %ds usage and string handling in Assembly
More information about string handling in Assembly
14. We explained, in step 11, that bss_start
has the address of the end
of the current program. bss_end
is defined at the end of the file.
We see that BSS()
is a macro that is made to define variables in respect
to bss_start
. bss_end
is, therefore, the end of this region called
bss
(TODO: what does it stand for? what are its purposes?). Eventually,
we realize that %ch
will hold the number of 512-byte blocks + 1, if there
is one uncompleted block. That is, the number of 512-byte blocks rounded up.
So, some examples:
bss_end
-bss_start
+ 511 = *511* -> 511/512 = 0.998 ->%ch
= 1bss_end
-bss_start
+ 511 = *512* -> 512/512 = 1.000 ->%ch
= 1bss_end
-bss_start
+ 511 = *513* -> 513/512 = 1.001 ->%ch
= 2
Let's remember that %ch
stores the 8 most significant bits of %cx
, i.
e., %cx = %ch * 256 + %cl
.
15. Repeat the next command.
16. Zero the memory region just after the current program, i.e., zero the
bss
. stosw
copies number in %ax
(zero) to memory regions starting
at %di
. If used with the rep
opcode (which is exactly our case) it
uses %cx
to know how much times it need to repeat, but remember,
since stosw
move *words* (16 bit), %cx
will have half of the bytes to
be written. E.g.: suppose we are going to write byte 0x0
100 times.
%ax
will be 0x0
and %cx
will be just 50 if we are going to use
stosw
(not stosb
).
This explains the code in step 14. "bss" size is 1040 bytes, (bss_end
- bss_start + 511)/512
is 3. But since it is being stored in %ch
and
%cx = %ch * 256 + %cl
, then %cx = 3 * 256 + %cl
, i.e., %cx = 768 +
%cl
. It will be a number lower than "bss" size but if stosw
copies
*words, not bytes*, it will traverse the double of bytes, so zeroing the
region.
TODO: why not just movw $(bss_end - bss_start)/2, %cx
? Ask dsl?
17. Finally, make a long jump to the mbr
address, whose value we pushed
to the stack at step 10.
*What this piece of code does?*
We moved all the rest of the MBR code to address 0x8800
, zeroed the
"bss" section (just after current address) and made a long jump to where
MBR code is. Now, let's just resume at where MBR is! At the source code,
it is coded just below the code we just studied. This is exactly what
firstly does a typical bootstrapping code, acording to [MBRX86].
We move ourselves out of the way because we'll load the next program,
pbr.S
, in address 0x7c00
Some might think: why not just execute
this program and copy pbr.S
elsewhere? We'll see later that pbr.S
can be loaded directly in some situations, without a previous program like
mbr.S
, so the BIOS would load it to 0x7c00
. We also need to load
it there because of all address linkage it has.
Contents of registers | Value |
---|---|
%ax | 0 |
Contents of stack | |
---|---|
Value | Observation |
mbr | Address of the mbr label where the MBR program trully begans. |
0 |
File src/sys/arch/i386/stand/mbr/mbr.S
, starting at line 145:
/*
* Sanity check the drive number passed by the BIOS. Some BIOSs may not
* do this and pass garbage.
*/
mbr:
cmpb $MAXDRV, %dl /* relies on MINDRV being 0x80 */
jle 1f
movb $MINDRV, %dl /* garbage in, boot disk 0 */
1:
push %dx /* save drive number */
push %dx /* twice - for err_msg loop */
We will consider the standard mbr program, where no options are defined.
One important thing about this code is that it uses *local labels*.
They just consist in numbers and jumping to them is made with suffix f
(forward) or b
(backward).
Information about local labels
Nice example on using local labels
We should pay attention to the %dl
register. This is the drive number MBR
was loaded from and according to [MBRX86] this is the only important number
BIOS passes to the MBR. It is 0x0
, 0x1
etc. for floppy drives and
0x80
, 0x81
etc. for hard disk drives.
Example: Pintos Operating System loader in Assembly with information about drive numbers
1. If first starts making a comparison of %dl
against $MAXDRV
, which
is 0x8f
, the biggest possible value (TODO: references?). If the number is
less-than or equal to 0x8f
, jump to the next 1
label. If not, force
value 0x80
(the first hard drive) to %dl
.
2. It them pushes the value of %dx
to the stack twice. Remember %dl
stores the lowest bits of the %dx
register.
The rest of the current piece of code is about the serial port and printing message to the user, things we are not interested in.
At the end of this piece of code, our important registers look the same, but the stack changed.
*What this piece of code does?*
It just checks the %dl
register. This is where BIOS stores the drive
number that we are booted from (HDD, Floppy disks, etc.). If it is a
invalid value, force 0x80
(the first hard drive).
Contents of register | Value |
---|---|
%ax | 0 |
Contents of stack | Observation |
---|---|
drive number | 0x80 probably |
drive number | 0x80 probably |
mbr | Address of the mbr label where the MBR program trully begans. |
0 |
Let's them take a look at a more complex code.
File src/sys/arch/i386/stand/mbr/mbr.S
, starting at line 172:
/*
* Walk through the selector (name) table printing used entries.
*
* Register use:
* %ax temp
* %bx nametab[] boot seletor menu
* %ecx base of 'extended' partition
* %edx next extended partition
* %si message ptr (etc)
* %edi sector number of this partition
* %bp parttab[] mbr partition table
*/
bootsel_menu:
movw $nametab, %bx
#ifdef BOOT_EXTENDED
xorl %ecx, %ecx /* base of extended partition */
next_extended:
xorl %edx, %edx /* for next extended partition */
#endif
lea parttab - nametab(%bx), %bp
next_ptn:
movb 4(%bp), %al /* partition type */
#ifdef NO_CHS
movl 8(%bp), %edi /* partition sector number */
#ifdef BOOT_EXTENDED
cmpb $MBR_PTYPE_EXT, %al /* Extended partition */
je 1f
cmpb $MBR_PTYPE_EXT_LBA, %al /* Extended LBA partition */
je 1f
cmpb $MBR_PTYPE_EXT_LNX, %al /* Linux extended partition */
jne 2f
1: movl %edi, %edx /* save next extended ptn */
jmp 4f
2:
#endif
addl lba_sector, %edi /* add in extended ptn base */
#endif
test %al, %al /* undefined partition */
je 4f
cmpb $0x80, (%bp) /* check for active partition */
jne 3f /* jump if not... */
#define ACTIVE (4 * ((KEY_ACTIVE - KEY_DISK1) & 0xff))
#ifdef NO_CHS
movl %edi, ptn_list + ACTIVE /* save location of active ptn */
#else
mov %bp, ptn_list + ACTIVE
#endif
#undef ACTIVE
3:
#ifdef BOOTSEL
cmpb $0, (%bx) /* check for prompt */
jz 4f
/* output menu item */
movw $prefix, %si
incb (%si)
call message /* menu number */
mov (%si), %si /* ':' << 8 | '1' + count */
shl $2, %si /* const + count * 4 */
#define CONST (4 * ((':' << 8) + '1' - ((KEY_PTN1 - KEY_DISK1) & 0xff)))
#ifdef NO_CHS
movl %edi, ptn_list - CONST(%si) /* sector to read */
#else
mov %bp, ptn_list - CONST(%si) /* partition info */
#endif
#undef CONST
mov %bx, %si
call message_crlf /* prompt */
#endif
4:
add $0x10, %bp
add $TABENTRYSIZE, %bx
cmpb $(nametab - start - 0x100) + 4 * TABENTRYSIZE, %bl
jne next_ptn
This piece of program seem very confusing at a first glance, but it is not: there are blocks we are just going to ignore, because we considered some macros are undefined or defined at the start of this section.
1. The first line moves the address of the nametab
string to the %bx
register. Commentary at the top of the code says nametab[]
stores "boot
selector menu". Let's take a look at the nametab
definition.
File src/sys/arch/i386/stand/mbr/mbr.S
, starting at line 636:
nametab:
.fill MBR_PART_COUNT * (MBR_BS_PARTNAMESIZE + 1), 0x01, 0x00
This .fill
directive will repeat the byte 0x0
, which has size
0x01
, *n* times (where *n* is MBR_PART_COUNT * (MBR_BS_PARTNAMESIZE
+ 1)
). MBR_PART_COUNT
and MBR_BS_PARTNAMESIZE
are defined in
src/sys/sys/bootblock.h
. Well, we already know, by section "Boot introduction", that *nametab* refers to 8 characters width names for up to 4
partitions, i. e., the .fill
directive reserves 36 bytes.
The .fill
directive works like a loop, in the following way: its
syntax is .fill count , size , value
. I.e., repeat the byte *value*
which has size *size*, *count* times.
The *bootsel* part (which includes the *nametab* strings), according to mbr(8), would allow some to chose which partition to boot from, but, for the purpose of understanding the basics of this program, we assume that *bootsel* is disabled, producing a much simplier MBR program.
There is also parttab
that we are going to use at the next step. It is
defined just below.
File src/sys/arch/i386/stand/mbr/mbr.S
, starting at line 650:
. = start + MBR_PART_OFFSET
parttab:
.fill 0x40, 0x01, 0x00
We see that the null byte is repeated 64 times (hex 0x40
), the exact size
for the partition table just before the magic number.
Detailed contents of a partition entry
2. Next line start with the lea
opcode. To understanding this line, it
is enough to understand that a lea
instruction is similar to a mov
one
but has some important and numerous differences. For this example, it is
enough to know that, in form mov address, register
, while mov
moves
the content of the memory address of the source operand to the destination,
lea
moves the address itself.
Explanation and more about LEA can be found here:
Why LEA was created and why it is used
It also uses a relative address mode. Addressing modes are well explained in [BARTLETT2009], page 35. Translating, the following expression:
parttab - nametab(%bx)
Is equivalent to::
(parttab - nametab) + %bx
(Note that the subtraction is done at compile time. So, when executing the
code, it will be something like x(%bx)
where x = parttab - nametab
.)
But %bx
is nametab
from step 1. This lead us to the expression:
parttab - nametab + nametab
Why the programmer wrote such a complicated line to store in %bp
the
address of parttab
? This is the first line of a loop if the user is
traversing other partitions. Later, %bx
is changed and the program
execution come back from label next_extended
, just above. Since we are
not interested in this, in our case the current instruction is equivalent to:
mov $parttab, %bp
So %bp
just holds the address of parttab
.
Before moving forward the next step, note that, from the comment at the
beginning of the current block, %bx
stores the address of nametab[] --
boot selector menu
and %bp
stores the address of parttab[] -- mbr
partition menu
.
3. Like the previous instruction, this one also uses a displacement to
specify address. The 4(%bp)
part means "take the address in %bp
, sum
4 bytes and store the content of the resulting address in %bp
". Because
%bp
has the address of the partition table, %bp + 4
will point to the
byte that stores the partition type.
4. Since NO_CHS
is undefined, we just go to the test
instruction.
test %al, %al
will make a bitwise *AND* operation on operands and set the
Zero Flag if the resulting of the *AND* operation is zero, i.e., if both
operands are zero, i.e., in our case, if %al
is zero . The next je
instruction checks the Zero Flag and jumps if it is zero. In summary, it
will jump if %al
is zero. Remember %al
holds the partition type so
the jump only makes sense if the first partition type is 0x0
, i.e.,
*<UNUSED>*.
5. Next comparison, cmbp $0x80, (%bp)
checks if the current partition is
active. Note that it is using indirect addressing in register %bp
because
it stores the address of the partition. To fetch the value pointed by the
address, we need to surround the register with parenthesis (for GNU asm).
Finally, compare if we just found a bootable partition. If not, jump to
3
label forward.
6. The mov
instruction just below, on the else block (because NO_CHS
is undefined) will be executed only if we already found an active partition
(see line we just analyzed). The mov
opcode will move the content of the
address in register %bp
to the memory location at ptn_list + ACTIVE
.
%bp
holds the address of the active partition. Remember ptn_list
? It is
defined using the BSS
macro and points to a 1 KB region just after the end
of the MBR program in memory. The ACTIVE
macro is defined just above:
File src/sys/arch/i386/stand/mbr/mbr.S
, starting at line 213:
#define ACTIVE (4 * ((KEY_ACTIVE - KEY_DISK1) & 0xff))
The definition of KEY_ACTIVE
and KEY_DISK1
constants are at the
beginning of the file and, for this version we are analyzing (COM_PORT
undefined) they are:
File src/sys/arch/i386/stand/mbr/mbr.S
, starting at line 89:
#define SCAN_ENTER 0x1c
#define SCAN_F1 0x3b
#define SCAN_1 0x2
#define KEY_ACTIVE SCAN_ENTER
#define KEY_DISK1 SCAN_F1
#define KEY_PTN1 SCAN_1
So, the ACTIVE
is calculated::
(4 * $(0x1c - 0x3b)) & 0xff (4 * $(28 - 59)) & 0xff (4 * (-31)) & 0xff (-124) & 0xff 132
Its value will be 132, so this instruction stores the location of the active
partition (%bp
) 132 bytes ahead of the start of ptn_list
region (i.e.,
132 bytes after the end of the MBR in memory).
We don't consider all the long block between #ifdef BOOTSEL
and its
closing #endif
. Let's then go to the next valid line for our study.
7. The next line we study is the addition of the value 0x10
to the
%bp
register. %bp
points to the MBR partition table. So, by summing
0x10
(i.e., 16), we just pointer to the next partition in MBR partition
table.
8. Likewise, sum %bx
to the width of the entry of the partition in the
*nametab* structure, making %bx
point to the next entry in *nametab*.
9. Next line makes a comparison. Remember that %bx
stores the address of
the *nametab* structure but, from the previous line, it is now pointing to the
next entry. %bl
will hold the lowest byte of it. The difficult
expression that comes first is not as difficult as we imagine: nametab -
start
is just the absolute address of *nametab*, without any memory offset.
It subtracts 0x100
in order to drop the highest byte of the subtraction.
Then, it sums 4 times the TABENTRYSIZE
. What all this mean? All this
expression will store the final address of the *nametab* region and compare it
with %bl
. If it is not equal, jump back to the next_ptn
label, making
everything again with the next partition, until all have been analysed.
*What this piece of code does?*
It looks complicated but the only thing this code does is to traverse the
partition table and look for one that is valid. In ptn_list
+ ACTIVE it
stores the address of the active partition in MBR.
Register contents | Value | Observation |
---|---|---|
%ax | 0 | |
%bp | ptn_list + 132 | The address of the active partition in MBR |
Contents of stack | Observation |
---|---|
drive number | 0x80 probably |
drive number | 0x80 probably |
mbr | Address of the mbr label where the MBR program trully begans. |
0 |
Contents of memory | Value |
---|---|
ptn_list + 132 | Address of active partition in the partition table |
We just finished this part of code. Now we already know how partitions are traversed in the MBR partition table and how to detect its filesystem type and if it is a bootable partition or not. Let's take a look at the next line:
File src/sys/arch/i386/stand/mbr/mbr.S
, starting at line 271:
#ifndef BOOTSEL
mov $(KEY_ACTIVE - KEY_DISK1) & 0xff, %ax
#else
We've already investigated the content of this partitions and a similar expression.
So, that piece of code is equivalent to:
mov $(0x1c - 0x3b) & 0xff, %ax
So, it moves the value -31 (decimal) = 225 (decimal or 0xe1) to register
%ax
.
Let's now take a look at the next part of the code, just below all BOOTSEL block we are not interested.
File src/sys/arch/i386/stand/mbr/mbr.S
, starting at line 344:
/*
* Boot requested partition.
* Use keycode to index the table we generated when we scanned the mbr
* while generating the menu.
*
* We very carfully saved the values in the correct part of the table.
*/
boot_ptn:
shl $2, %ax
movw %ax, %si
#ifdef NO_CHS
movl ptn_list(%si), %ebp
testl %ebp, %ebp
jnz boot_lba
#else
mov ptn_list(%si), %si
test %si, %si
jnz boot_si
#endif
Let's take a look at it line by line:
1. The shl
instruction shift bits to the left side (like multiplying the
value in the register by two), equivalent to the <<
operator in C. Since
it is shifting the bits two positions, the value %ax
holds is changed from
225 (0xe1) to 132 (0x84).
2. The second line just moves the value in %ax
to register %si
.
3. Later, after the else
clause (we are not interested in block that
exists if NO_CHS
is defined) it moves the content of the memory region
ptn_list(%si)
to %si
itself. What is in ptn_list(%si)
? This the
same as ptn_list(132)
, i.e., memory region indexed by ptn_list + 132
.
What do we have there? Remember the ACTIVE
calculation above? We stored
in ptn_list + ACTIVE
(which is ptn_list + 132
the content of %bp
register, which stores the address of the active partition.
4. Again we see the opcode test
using the same register (%si
) for
both sides. This is used by to check the contents of %si
, i.e., if it is
zero. If it is not (desirable for our analysis), jnz
below makes it jump
to boot_si
label.
*What this piece of code does?*
It just stores in register %si
the address of the active partition.
Register Contents | Observation | |
---|---|---|
%ax | 132 | |
%si | ptn_list + 132 | Address of active partition in the partition table |
Contents of the stack | Observation |
---|---|
drive number | 0x80 probably |
drive number | 0x80 probably |
mbr | Address of the mbr label where the MBR program trully begans. |
Let's now check the boot_si
part of the program:
File src/sys/arch/i386/stand/mbr/mbr.S
, starting at line 409:
/*
* Active partition pointed to by si.
* Read the first sector.
*
* We can either do a CHS (Cylinder Head Sector) or an LBA (Logical
* Block Address) read. Always doing the LBA one
* would be nice - unfortunately not all systems support it.
* Also some may contain a separate (eg SCSI) bios that doesn't
* support it even when the main bios does.
*
* There is also the additional problem that the CHS values may be wrong
* (eg if fdisk was run on a different system that used different BIOS
* geometry). We convert the CHS value to a LBA sector number using
* the geometry from the BIOS, if the number matches we do a CHS read.
*/
boot_si:
movl 8(%si), %ebp /* get sector # */
testb $MBR_BS_READ_LBA, flags
jnz boot_lba /* fdisk forced LBA read */
pop %dx /* collect saved drive... */
push %dx /* ...number to dl */
movb $8, %ah
int $0x13 /* chs info */
Let's analyse this code:
1. %si
stores the address of the active partition. 8(%si)
is %si +
8
and means "points to the active partition and look to eight bytes ahead.
It is the address of the LBA address of the partition (which is four bytes).
The movl
instruction moves a long
word, four bytes, to the %ebp
32-bit register, i.e., now the %ebp
register stores the LBA number of the partition.
2. The next tesb
and jnz
instruction use the MBR_BS_READ_LBA
instruction and compare it with data written in the flags
label. Since
the MBR_BS_READ_LBA
flag is deprecated (according to a comment in
bootblock.h
and the flags
label holds other information, we just skip
that line.
3. Next two lines, instructions pop
and push
on the register %dx
look a little mysterious to me at a first glance. Why is it poping and
pushing again? The explanation I find is that we need to make sure %dx
has the drive number (in %dl
, the lowest byte) but at the same time it
needs to be at the top of the stack. So we pop
it from the stack to
%dx
and push
it again to the stack. Some might realize that %dx
never changed in this analysis but it may be changed in some code where
BOOTSEL
or COM_PORT
are defined.
4. Finally, we move 8 to register %ah
and call the *INT 13H*, which is
the interruption responsible to make read and write operations using the CHS
addressing. When %ah
is 8, it reads disk parameters. %dl
must
point to the drive number (first one is 0x80
). It return parameters in
some registers, whose important (for this study) are:
More information about INT 13 interruption
%dh
logical last index of heads (number of heads - 1 (because the index starts with 0))
%cx
logical last index of cylinders = number of cylinders - 1 (because index
starts with 0) and logical last index of sectors per track = number_of
(because index starts with 1).
*What this piece of code does?*
Asks the BIOS information about the HDD, number of heads, cylinders and
sectors in registers %cx
and %dh
. Store LBA information in
register %ebp
as well.
Contents of registers | Value | Observation |
---|---|---|
%cx | information about heads cylinders and sectors (see above) | |
%dh | logical last index of heads (number of heads - 1) (see above) | |
%dl | probably 0x80 | Number of the current drive |
%si | Address of the current partition in partition table | |
%ebp | LBA number of the partition |
The next block of code, which is a very near continuation of the last one, follows.
File src/sys/arch/i386/stand/mbr/mbr.S
, starting at line 435:
/* * Validate geometry, if the CHS sector number doesn't match the LBA one * we'll do an LBA read. * calc: (cylinder * number_of_heads + head) * number_of_sectors + sector * and compare against LBA sector number. * Take a slight 'flier' and assume we can just check 16bits (very likely * to be true because the number of sectors per track is 63). */ movw 2(%si), %ax /* cylinder + sector */ push %ax /* save for sector */ shr $6, %al xchgb %al, %ah /* 10 bit cylinder number */ shr $8, %dx /* last head */ inc %dx /* number of heads */ mul %dx mov 1(%si), %dl /* head we want */ add %dx, %ax and $0x3f, %cx /* number of sectors */ mul %cx pop %dx /* recover sector we want */ and $0x3f, %dx add %dx, %ax dec %ax cmp %bp, %ax je read_chs
1. We know that %si
has the address of the current partition. So,
2(%si)
looks for two bytes ahead of the beginning of its description. If
we look at the structure of an entry in the partition table we see that bytes 1-4 (3 bytes)
represent the CHS address of the partition. According to [MBR] these 3 bytes
are organized the following way:
Byte | Bits distribution |
---|---|
1 | h7-h6-h5-h4-h3-h2-h1-h0 |
2 | c9-c8-s5-s4-s3-s2-s1-s0 |
3 | c7-c6-c5-c4-c3-c2-c1-c0 |
The first byte (h7-h0 bits) is reserved for the head index, up to 255, but
that normally is a much lower number. To discover how many heads your HDD
has, use the fdisk(8) command. Also, the -v
flag will tell you the CHS
addressing information about partitions.
TODO: fdisk(8) outputs both the NetBSD concept of the disk geometry and the BIOS concept. Why the difference?
*Note:* According to [MBR], the CHS conception does not correspond to modern drives.
The second byte has a composite distribution. Number of cylinders are high, much bigger than 255, so we need more bits than eight. The two most significant bits (c9-c8) of this byte are reserved for the cylinders index. On the opposite side, the number of sectors is low, so six bits (s5-s0) are enough.
The third byte has all its eight bits (c7-c0) reserved for the less significant part of the cylinder index.
So, everything the first line of code does is to push the cylinder and sector
part of the code to %ax
.
2. Then, push %ax
to the stack. We will modify %ax
later to find the
cylinder information, but we need to store it somewhere because we'll need to
find the sector later.
3. The shr
instruction seems strange, but it is not. The cylinder
information is stored with 10 bits: the most significant two bits of %al
and the remaining byte 2 of %ah
, as we saw previously. So, we shift the
bits six positions to right, leaving the two most significant bits alone and
zeroing the others.
Why is byte 1 in %al
and not %ah
? Because we fetch this information
from 2(%si)
, i.e., memory. So endianness apply here: we are
little-endian.
Wikipedia page about Endianess
4. Again: because of endianness, we need to swap the contents of %al
and
%ah
. Now, %ax
has the 10 bit cylinder number we need.
5. Then, we simply shift bits right eight positions in %dx
because
%dh
has the index of the head and we need it positioned right in %dx
.
6. Next two lines, we just increment %dx
by one to get the number of the
head. Remember that %dx
held the index of the last head, which is
number_of_heads - 1
.
7. Then, we multiply %dx
by %ax
. %ax
has number of cylinders. The result
is stored in %dx:%ax
(This will store the most significant part in %dx
and
the less significant part in %ax
). We just started to make the formula
described at the commentary, i.e., we are converting CHS to LBA to see if both
match. So this part is just "cylinder * number_of_heads".
More about the mul instruction
8. The next line is a bit strange. It moves the head number to %dl
.
Some would argue that it overwrites the result of multiplication (that is
stored in %dx:%ax
) and that is right. The explanation I find for this is
that it is expected that the numbers are so small that %dx
was zero, so we
can use %dl
and %dh
without worries.
9. Add %dx
to %ax
. We now already have "cylinder * number_of_heads +
head".
10. Remember %cx
stores the cylinder and sector information about this
system we fetch back there using *INT 13H*? You also remember %cx
is 16
bit and sector information is stored only in the less significant six bits?
How to extract them? This line just makes an and
with 0x3f
, which is
decimal 63
. In binary: 00111111
.
11. Now, multiply %ax
by %cx
. We now have "(cylinder *
number_of_heads + head) * number_of_sectors".
12. Let's pop
the two bytes represent cylinder and sector numbers we
stored at the top of this piece of code. We pop
it to %dx
.
13. Apply and
again to have the information about sectors
only.
14. And add
it to %ax
. In %ax
, we now finally have the LBA
number, which is "(cylinder * number_of_heads + head) * number_of_sectors +
sector"!
15. We actually have to decrement by one the result, because LBA starts with zero.
16. Finally we compare both values. From %ebp
setting in last piece of
code, we know it stores the LBA of the NetBSD partition. We just need to
compare 16 bits, so we compare %bp
with %ax
.
17. Supposing everything is all right, do a CHS read.
*What this piece of code does?*
Converts CHS numbers in LBA. If both CHS and recorded LBA match, do a CHS read.
Contents of stack| Observation drive number| 0x80 probably mbr| Address of the mbr label where the MBR program trully begans. 0
File src/sys/arch/i386/stand/mbr/mbr.S
, starting at line 522:
/*
* Sector below CHS limit
* Do a cylinder-head-sector read instead.
*/
read_chs:
pop %dx /* recover drive # */
movb 1(%si), %dh /* head */
movw 2(%si), %cx /* ch=cyl, cl=sect */
movw $BOOTADDR, %bx /* es:bx is buffer */
movw $0x201, %ax /* command 2, 1 sector */
jmp do_read
This is simple: just pop the stack and let %dx
store the drive number.
Recover other things to the original registers, move the 0x201
value to
%ax
and jump to the do_read
label! 0x201
is just 0x2
in
%ah
and 0x1
in %al
.
*What this piece of code does?*
Just stores CHS values in right registers to make a CHS read later.
Let's quickly take a look at the do_read
label:
File src/sys/arch/i386/stand/mbr/mbr.S
, starting at line 481:
/*
* Save sector number (passed in %ebp) into lba parameter block,
* read the sector and leap into it.
*/
boot_lba:
movl %ebp, lba_sector /* save sector number */
movw $lba_info, %si
movb $0x42, %ah
pop %dx /* recover drive # */
do_read:
push %dx /* save drive */
int $0x13
set_err(ERR_READ)
jc err_msg
Before going on the do_read
label, it is important to note that, just
before this code there is the boot_lba
code that is much simplier than
using and calculating CHS. It is worth to note that, from the last code we
analyzed, if CHS and LBA don't match, LBA is preferred.
It simply push %dx
again to store the drive number and call *INT 13*,
which read or write disk stuff. We know that %ah = 0x2
that means "read
sectors from the disk". %al
is the count for sectors we want to read and
we know %al = 0x01
. Call *INT 13* that read the sector to %es:%bx
.
More information about the "Read Sectors From Drive" operation
The %es:%bx
is a notation for *segmentation*. In Real Mode memory is
divided in segments of 64 kb, using two registers. In this example, the
interruption used the *extra segment* register, %es
and %bx
. %es
points to the segment and %bx
to the offset inside that segment (up to 64
kb). The math is: %es * 0x10 + %bx
. We know that %es
is zero and
%bx
is BOOTADDR
(0x7c00
), so 0 * 0x10 + 0x7c00 = 0x7c00
. The
first sector of the partition will be loaded in address 0x7c00
.
More information on segmentation
Finally, we check an error (take a look to the set_err
macro) and jump to
the error messages printing if something happened.
*What this piece of code does?*
Read the first sector of the current partition in address 0x7c00
.
After that, continue the next piece of code.
File src/sys/arch/i386/stand/mbr/mbr.S
, starting at line 497:
/*
* Check signature for valid bootcode
*/
movb BOOTADDR, %al /* first byte non-zero */
test %al, %al
jz 1f
movw BOOTADDR + MBR_MAGIC_OFFSET, %ax
1: cmp $MBR_MAGIC, %ax
set_err(ERR_NOOS)
jnz err_msg
We now have the first sector of the partition loaded in memory. We first
check (with test
and jz
) if the first byte of the sector is zero. It
can't be so if it is, jump to label 1. Then, we check if the last two bytes
of MBR are the MBR_MAGIC
(i.e., 0xaa55
) and, if they are different,
jump to err_msg, printing an error message to the user and halts.
The cmp
works subtracting both sides of the comparison. If the
subtraction is zero, it sets the Zero Flag, so jnz
will branch if the Zero
Flag is not zero, i.e., if MBR_MAGIC
and %ax
don't match.
More information on the `cmp` and `jz` instructions
jz and je have the same meaning:
*Note:* The 1:
label here is tricky. When reading this piece of code I thought
that this was wrong, because jz
should branch to the set_err
place
directly, so the 1:
label should be in front of set_err
, right? I
was wrong, the label is at the right place. Since we changed %al
first, if the %al
is zero, cmp
will still fail and jnz
will
branch too. If the first byte is right, but %ax
(the MBR magic number)
is wrong, it will also fail. It works.
If 1:
were in front of set_err
and jz
jumps, jnz
would not
jump. So making cmp
is needed anyway. (Thanks jakllsch -at- netbsd.org for this explanation!)
*What this piece of code does?*
Checks the if the first byte of the loaded sector is not zero and if the MBR magic number matches. If everything is ok, proceed.
So, if everything worked, we just jump to the next and final piece of code!
File src/sys/arch/i386/stand/mbr/mbr.S
, starting at line 508:
/* We pass the sector number through to the next stage boot.
* It doesn't have to use it (indeed no other mbr code will generate) it,
* but it does let us have a NetBSD pbr that can identify where it was
* read from! This lets us use this code to select between two
* NetBSD system on the same physical driver.
* (If we've read the mbr of a different disk, it gets a random number
* - but it wasn't expecting anything...)
*/
movl %ebp, %esi
pop %dx /* recover drive # */
jmp BOOTADDR
%ebp
has the LBA of the NetBSD partition. Move it to %esi
so the next
code can use it (if it wants to). Also, pop the stack so %dx
stores the
current disk number. Finally, jump to the next program, loaded in address
BOOTADDR
, which is pbr.S
:-)!
Second program: pbr.S
Before dive into some more asm code, let's take a look at two of the commentaries at the beginning of this file.
File src/sys/arch/i386/stand/bootxx/pbr.S
, starting at line 32:
/*
* i386 partition boot code
*
* This code resides in sector zero of the netbsd partition, or sector
* zero of an unpartitioned disk (eg a floppy).
* Sector 1 is assumed to contain the netbsd disklabel.
* Sectors 2 until the end of the track contain the next phase of bootstrap.
* Which know how to read the interactive 'boot' program from filestore.
* The job of this code is to read in the phase 1 bootstrap.
*
* Makefile supplies:
* PRIMARY_LOAD_ADDRESS: Address we load code to (0x1000).
* BOOTXX_SECTORS: Number of sectors we load (15).
* X86_BOOT_MAGIC_1: A random magic number.
*
* Although this code is executing at 0x7c00, it is linked to address 0x1000.
* All data references MUST be fixed up using R().
*/
And also:
File src/sys/arch/i386/stand/bootxx/pbr.S
, starting at line 89:
/*
* This code is loaded to addresss 0:7c00 by either the system BIOS
* (for a floppy) or the mbr boot code. Since the boot program will
* be loaded to address 1000:0, we don't need to relocate ourselves
* and can load the subsequent blocks (that load boot) to an address
* of our choosing. 0:1000 is a not unreasonable choice.
*
* On entry the BIOS drive number is in %dl and %esi may contain the
* sector we were loaded from (if we were loaded by NetBSD mbr code).
* In any case we have to re-read sector zero of the disk and hunt
* through the BIOS partition table for the NetBSD partition.
*
* Or, we may have been loaded by a GPT hybrid MBR, handoff state is
* specified in T13 EDD-4 annex A.
*/
First: this code is loaded from mbr.S
but it need not to be. This code is
supposed to be in the beginning of a NetBSD partition, but there can exist
unpartitioned devices (such as a floppy disk or even a HDD) where pbr.S
is
the program to be loaded by the BIOS. It is important to know that because
we'll see that all the job we have done in mbr.S
will be done again here
in pbr.S
to find the boot partition.
Since we already walked carefully on assembly instructions, explaining them from the beginning, we'll go a little faster this section.
File src/sys/arch/i386/stand/bootxx/pbr.S
, starting at line 107:
ENTRY(start)
/*
* The PC BIOS architecture defines a Boot Parameter Block (BPB) here.
* The actual format varies between different MS-DOS versions, but
* apparently some system BIOS insist on patching this area
* (especially on LS120 drives - which I thought had an MBR...).
* The initial jmp and nop are part of the standard and may be
* tested for by the system BIOS.
*/
jmp start0
nop
.ascii "NetBSD60" /* oemname (8 bytes) */
. = start + MBR_BPB_OFFSET /* move to start of BPB */
/* (ensures oemname doesn't overflow) */
. = start + MBR_AFTERBPB /* skip BPB */
It seems that BPB is not used in NetBSD, but pbr.S
has to
predict it. The first line jumps to start0
label, but make sure to leave
some bytes for the BPB if some program wants to patch it.
BPS means "Boot Parameter Block", aka BIOS Parameter Block (no relationship with the BIOS we know). Mix code and data and was created for the DOS Operating System.
A great explanation of what is BPB and its history
File src/sys/arch/i386/stand/bootxx/pbr.S
, starting at line 124:
start0:
xor %cx, %cx /* don't trust values of ds, es or ss */
mov %cx, %ss
mov %cx, %sp
mov %cx, %es
#ifndef BOOT_FROM_FAT
cmpl $0x54504721, %eax /* did a GPT hybrid MBR start us? */
je boot_gpt
#endif
mov %cx, %ds
xor %ax, %ax
/* A 'reset disk system' request is traditional here... */
push %dx /* some BIOS zap %dl here :-( */
int $0x13 /* ah == 0 from code above */
pop %dx
/* Read from start of disk */
incw %cx /* track zero sector 1 */
movb %ch, %dh /* dh = head = 0 */
call chs_read
First lines after start0
are just a cleaning of registers. We support
BOOT_FROM_FAT
is disabled, so lets ignore code within #ifndef
block.
First, it pushes %dx
to the stack before calling *INT 13*. *INT 13* is
then called with %ax
zero, i.e., reset the disk drive putting the head at
the beginning of the disk. Next, it pops the value back. Then, it sets
registers that are parameters for disk reading.
More information on *INT 13*, when `%ax` is zero
File src/sys/arch/i386/stand/bootxx/pbr.S
, starting at line 361:
chs_read:
movw $BOOTADDR, %bx /* es:bx is buffer */
pusha
movw $0x200 + BOOTXX_SECTORS, %ax /* command 2, xx sectors */
jmp do_read
File src/sys/arch/i386/stand/bootxx/pbr.S
, starting at line 351:
do_read:
int $0x13
popa
set_err(ERR_READ)
jc error
ret
They are not very different from the code in mbr.S
, but there are some
details that change. Let's flesh this out.
First, it moves BOOTADDR
to %bx
, which means that the result of the
read operation will be stored in address %es * 0x10 + %bx
. pusha
pushes the content of general purpose registers (%ax
, %bx
, %cx
and
%dx
) onto the stack. See that, BOOTADDR
in pbr.S
is
**different** from the one found in mbr.S
. In mbr.S
it was
0x7c00
. In pbr.S
it is 0x1000
, from the early defines in the
file.
First, where is the BOOTXX_SECTORS
definition? It is in
Makefile.bootxx
and is 15. We'll show the Makefile later when we talk
about the "Compiling bootxx" section.
So, the value that will be loaded into %ax
is just 0x2
for %ah
and
BOOTXX_SECTORS
for %al
.
Finally, parameters for reading the disk are just like the table bellow:
Register contents for disk reading | Observation |
---|---|
%cx | Cylinder and sector information: cylinder 0 and sector 1. |
%dh | Head 0. |
%dl | Pop from the stack before. It is delivered to us from mbr.S . |
%ah | 0x2 (read the disk!). |
%al | BOOTXX_SECTORS (read BOOTXX_SECTORS sectors - 15). |
%es | zero. |
%bx | BOOTADDR (0x1000 ). |
*Important:* It is important to note that the very first sector of the disk (which in
LBA would be 0x00000000
starts in cylinder, head, sector = 0, 0, 1.
Yes, sector 1 (not zero!). This is important to not get confused about
which sectors are being loaded this time. We are just loading the MBR
again!
The CHS addressing scheme starting at sector 1 is due to historical reasons
Then, a jump is made to do_read
and *INT 13* is triggered.
After that, we just pop registers back. We suppose everything is OK (no error exists) and return to the main code. Next piece of code follows.
File src/sys/arch/i386/stand/bootxx/pbr.S
, starting at line 146:
/* See if this is our code, if so we have already loaded the next stage */
xorl %ebp, %ebp /* pass sector 0 to next stage */
movl (%bx), %eax /* MBR code shouldn't even have ... */
cmpl R(start), %eax /* ... a jmp at the start. */
je pbr_read_ok1
So, it zeroes %ebp
and move the first two bytes of address held in %bx
(i.e., BOOTADDR
to %eax
. Remember the BOOTADDR
address is the
buffer where we stored what we just loaded, which are the first
BOOTXX_SECTORS
beginning from sector 1 of the disk. So, it compares the
first two bytes of the current program (just after label R(start)
) with
the ones loaded in memory.
This comparison exist because, in a unpartitioned disk, pbr.S
can be at
the start of the disk and it can be loaded directly. Since it didn't happen
in our example (so the comparison will fail, because MBR doesn't have a jmp at
the start, je
will not branch and we resume at the next block of code.
MBR now is loaded to address BOOTADDR
. So, the next block follows.
File src/sys/arch/i386/stand/bootxx/pbr.S
, starting at line 153:
/* Now scan the MBR partition table for a netbsd partition */
xorl %ebx, %ebx /* for base extended ptn chain */
scan_ptn_tbl:
xorl %ecx, %ecx /* for next extended ptn */
movw $BOOTADDR + MBR_PART_OFFSET, %di
1: movb 4(%di), %al /* mbrp_type */
movl 8(%di), %ebp /* mbrp_start == LBA sector */
addl lba_sector, %ebp /* add base of extended partition */
#ifdef BOOT_FROM_FAT
cmpb $MBR_PTYPE_FAT12, %al
je 5f
cmpb $MBR_PTYPE_FAT16S, %al
je 5f
cmpb $MBR_PTYPE_FAT16B, %al
je 5f
cmpb $MBR_PTYPE_FAT32, %al
je 5f
cmpb $MBR_PTYPE_FAT32L, %al
je 5f
cmpb $MBR_PTYPE_FAT16L, %al
je 5f
#else
cmpb $MBR_PTYPE_NETBSD, %al
#endif
jne 10f
5: testl %esi, %esi /* looking for a specific sector? */
je boot
cmpl %ebp, %esi /* ptn we wanted? */
je boot
The commentary at the top explains everything: we should find the NetBSD
partition. First it zeroes the %ebx
register and the %ecx
register
after label scan_ptn_tbl
. Then, it moves the address of the beginning of
the table (start of MBR + MBR_PART_OFFSET
- 446) to register %di
.
After that, it moves the byte regarding the partition type (at offset 4) to
%al
and four bytes to %ebp
. The four bytes start at offset 8 in the
partition table, i.e., they are the LBA address of the partition.
The line just below adds lba_sector
to %ebp
. What is this? This is a
define from the top of the file pointing to a label called _lba_sector
and
we see it is just the following:
File src/sys/arch/i386/stand/bootxx/pbr.S
, starting at line 403:
/* Control block for int-13 LBA read. */
_lba_info:
.word 0x10 /* control block length */
.word BOOTXX_SECTORS /* sector count */
.word BOOTADDR /* offset in segment */
.word 0 /* segment */
_lba_sector:
.quad 0 /* sector # goes here... */
It is just zero! So it makes the addl
line just useless? It seems so,
but there might be a reason why this instruction is there. Maybe when
producing a different bootxx
program. The commentary at the top of this
code give us tips. Also, looking at the commit history for the file is something that is
always a good tip. We see, from the diff log, that this was added at the
time GPT support was added. Maybe this is a GPT thing?
Diff of the pbr.S file, revision 1.18 to 1.19
Let's step back to the last block we were analyzing (and didn't finish) and
consider the BOOT_FROM_FAT
undefined, so we don't need to take a look
at the code that reads NetBSD from a FAT partition. We go directly to the
line just below the #else
directive. It compares if %al
has the
number that matches the code for the NetBSD partition. If they are equal,
jne
will not jump, so we resume at label 5:
. Note that the BOOT_FROM_FAT
define is used to surround GPT code.
We saw that the testl
instruction will make je
jumps if the register,
%esi
is zero. We received %esi
from mbr.S
and it contains the LBA
sector number of the NetBSD partition. Since this is not zero, je
will
not jump and resume to the next instruction. The next instruction is a
cmpl
comparison of %ebp
and %esi
. We suppose the NetBSD partition
is the first one and %ebp
and %esi
are the same, so we jump to label
boot:
. If you are curious, the rest of this block that we are ignoring here
is about extended partition handling.
File src/sys/arch/i386/stand/bootxx/pbr.S
, starting at line 227:
/*
* Active partition pointed to by di.
*
* We can either do a CHS (Cylinder Head Sector) or an LBA (Logical
* Block Address) read. Always doing the LBA one
* would be nice - unfortunately not all systems support it.
* Also some may contain a separate (eg SCSI) BIOS that doesn't
* support it even when the main BIOS does.
*
* The safest thing seems to be to find out whether the sector we
* want is inside the CHS sector count. If it is we use CHS, if
* outside we use LBA.
*
* Actually we check that the CHS values reference the LBA sector,
* if not we assume that the LBA sector is above the limit, or that
* the geometry used (by fdisk) isn't correct.
*/
boot:
movl %ebp, lba_sector /* to control block */
testl %ebx, %ebx /* was it an extended ptn? */
jnz boot_lba /* yes - boot with LBA reads */
/* get CHS values from BIOS */
push %dx /* save drive number */
movb $8, %ah
int $0x13 /* chs info */
/*
* Validate geometry, if the CHS sector number doesn't match the LBA one
* we'll do an LBA read.
* calc: (cylinder * number_of_heads + head) * number_of_sectors + sector
* and compare against LBA sector number.
* Take a slight 'flier' and assume we can just check 16bits (very likely
* to be true because the number of sectors per track is 63).
*/
movw 2(%di), %ax /* cylinder + sector */
push %ax /* save for sector */
shr $6, %al
xchgb %al, %ah /* 10 bit cylinder number */
shr $8, %dx /* last head */
inc %dx /* number of heads */
mul %dx
mov 1(%di), %dl /* head we want */
add %dx, %ax
and $0x3f, %cx /* number of sectors */
mul %cx
pop %dx /* recover sector we want */
and $0x3f, %dx
add %dx, %ax
dec %ax
pop %dx /* recover drive nmber */
cmp %bp, %ax
je read_chs
We have already seem this problem when seeing "First program: mbr.S". We
cannot trust the disk LBA, so we fetch CHS information about the disk from the
BIOS. We convert the partition CHS to LBA and compare the LBA we just
calculated against LBA information partition holds. We both match, we do a
CHS read, if not, it means that the partition can be beyond the CHS
representation and prefer a LBA read. Let's assume they match and we perform
a CHS read, so we jump to the read_chs:
label.
File src/sys/arch/i386/stand/bootxx/pbr.S
, starting at line 214:
/*
* Sector below CHS limit
* Do a cylinder-head-sector read instead
* I believe the BIOS should do reads that cross track boundaries.
* (but the read should start at the beginning of a track...)
*/
read_chs:
movb 1(%di), %dh /* head */
movw 2(%di), %cx /* ch=cyl, cl=sect */
call chs_read
pbr_read_ok1:
jmp pbr_read_ok
So, as we saw before, the following registers are set: %dh
holds the head
number, %cx
both cylinder and sector information. The parameters set are
of the NetBSD partition, remember? Then we call chs_read
that we have
already seen. It reads 15 sectors from the address we just set, i.e., it
reads the first 15 sectors of the NetBSD partition to BOOTADDR
. Finally,
we jump to pbr_read_ok:
.
File src/sys/arch/i386/stand/bootxx/pbr.S
, starting at line 333:
/*
* Check magic number for valid stage 2 bootcode
* then jump into it.
*/
pbr_read_ok:
cmpl $X86_BOOT_MAGIC_1, bootxx_magic
set_err(ERR_NO_BOOTXX)
jnz error
movl %ebp, %esi /* %esi ptn base, %dl disk id */
movl lba_sector + 4, %edi /* %edi ptn base high */
jmp $0, $bootxx /* our %cs may not be zero */
The commentary is self-explanatory. Every boot stage has a magic number to be
checked, so we know if it is safe to run it. X86_BOOT_MAGIC_1
is defined
in bootblock.h
. bootxx_magic
is a label in bootxx.S
file (that
we'll take a look at the next section). It is that bootxx.S
, pbr.S
,
label.S
and boot1.c
are linked together, so the bootxx_magic
label
is resolved at compile time.
TODO: ask jakllsch about line 343 (lba_sector + 4, %edi
): this was added
for GPT support
([http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/i386/stand/bootxx/pbr.S.diff?r1=1.18&r2=1.19&only_with_tag=MAIN&f=h]).
Shouldn't it be guarded by some guard (header) or at least have a comment?
We suppose we have no error. Let's just move %ebp
(that stores the LBA
address of the partition) to %esi
. %dl
holds the drive number (from
the last pop
opcode). Finally, we jump bootxx:
label, which is in
declared in the bootxx.S
file.
Third program: bootxx.S
Before continuing, let's first understand what these programs in inside the
bootxx
directory mean. They are different programs, but they are located
inside the same directory, why? We already told that they are all linked
together in a bigger program. We told that bootxx.S
, pbr.S
,
label.S
and boot1.c
are linked together, but not only them. The
contents of the src/sys/arch/i386/stand/lib
directory too and it has a lot
of important files we'll later take a look.
Let's take a look at an important part of
src/sys/arch/i386/stand/bootxx/Makefile.bootxx
(that is included by each
Makefile in bootxx_*
directory).
File src/sys/arch/i386/stand/bootxx/Makefile.bootxx
, starting at line 94:
I386_STAND_DIR?= $S/arch/i386/stand
### find out what to use for libi386
I386DIR= ${I386_STAND_DIR}/lib
.include "${I386DIR}/Makefile.inc"
LIBI386= ${I386LIB}
### find out what to use for libsa
SA_AS= library
SAMISCMAKEFLAGS+="SA_USE_LOADFILE=yes"
.include "${S}/lib/libsa/Makefile.inc"
LIBSA= ${SALIB}
### find out what to use for libkern
KERN_AS= library
.include "${S}/lib/libkern/Makefile.inc"
LIBKERN= ${KERNLIB}
LDSCRIPT ?= $S/arch/i386/conf/stand.ldscript
cleandir distclean: .WAIT cleanlibdir
cleanlibdir:
-rm -rf lib
LIBLIST= ${LIBI386} ${LIBSA} ${LIBKERN} ${LIBI386} ${LIBSA}
CLEANFILES+= ${PROG}.sym ${PROG}.map
${PROG}: ${OBJS} ${LIBLIST} ${LDSCRIPT}
${_MKTARGET_LINK}
${CC} -o ${PROG}.sym ${LDFLAGS} -Wl,-Ttext,${PRIMARY_LOAD_ADDRESS} \
-T ${LDSCRIPT} -Wl,-Map,${PROG}.map -Wl,-cref ${OBJS} ${LIBLIST}
${OBJCOPY} -O binary ${PROG}.sym ${PROG}
By seeing the ${CC}
command, we see that the build framework made on the
top of make (at the bottom of the file you'll see .include <bsd.prog.mk>
)
already compiled source files to object files (This compiles all *.S
and *.c
files in *.o
. This line link them together in a single
executable, ${PROG}.sym
. But see the last variable of this command,
${LIBLIST}
. It also link this together!
More information about the build framework
${LIBLIST}
is defined just above and ${LIBI386}
is ${I386LIB}
,
which is then defined in ${I386DIR}/Makefile.inc
. ${I386DIR}
is just
the the directory ../lib
relative to the bootxx
directory.
So, it includes ../lib/Makefile.inc
. Let's take a look on an important
part of it.
File src/sys/arch/i386/stand/lib/Makefile.inc
, starting at line 16:
# Default values:
I386DST?= ${.OBJDIR}/lib/i386
#I386DIR= $S/arch/i386/stand/lib
I386LIB= ${I386DST}/libi386.a
CWARNFLAGS.clang+= -Wno-tautological-compare
I386MAKE= \
cd ${I386DIR} && MAKEOBJDIRPREFIX= && unset MAKEOBJDIRPREFIX && \
MAKEOBJDIR=${I386DST} ${MAKE} \
CC=${CC:Q} CFLAGS=${CFLAGS:Q} \
AS=${AS:Q} AFLAGS=${AFLAGS:Q} \
LD=${LD:Q} STRIP=${STRIP:Q} \
MACHINE=${MACHINE} MACHINE_ARCH=${MACHINE_ARCH:Q} \
I386CPPFLAGS=${CPPFLAGS:S@^-I.@-I../../.@g:Q} \
I386MISCCPPFLAGS=${I386MISCCPPFLAGS:Q} \
${I386MISCMAKEFLAGS}
${I386LIB}: .NOTMAIN __always_make_i386lib
@echo making sure the i386 library is up to date...
@${I386MAKE} libi386.a
@echo done
It just enters the ${I386DIR}
and calls ${MAKE}
. I.e., the main
Makefile
is read. We finally see a part of it.
File src/sys/arch/i386/stand/lib/Makefile
, starting at line 22:
SRCS= pcio.c conio.S comio.S comio_direct.c biosvideomode.S
SRCS+= getsecs.c biosgetrtc.S biosdelay.S biosreboot.S gatea20.c
SRCS+= biosmem.S getextmemx.c biosmemx.S printmemlist.c
SRCS+= pread.c menuutils.c parseutils.c
SRCS+= bootinfo.c bootinfo_biosgeom.c bootinfo_memmap.c
SRCS+= startprog.S multiboot.S
SRCS+= biosgetsystime.S cpufunc.S bootmenu.c
SRCS+= realprot.S message.S message32.S dump_eax.S pvcopy.S putstr.S putstr32.S
SRCS+= rasops.c vbe.c biosvbe.S
.if (${I386_INCLUDE_DISK} == "yes")
SRCS+= biosdisk.c biosdisk_ll.c bios_disk.S
.endif
.if (${I386_INCLUDE_DOS} == "yes")
SRCS+= dosfile.c dos_file.S
.endif
.if (${I386_INCLUDE_DISK} == "yes") || (${I386_INCLUDE_DOS} == "yes")
SRCS+= diskbuf.c
.endif
.if (${I386_INCLUDE_BUS} == "yes")
SRCS+= biospci.c bios_pci.S isapnp.c isadma.c
.endif
.if (${I386_INCLUDE_PS2} == "yes")
SRCS+= biosmca.S biosmemps2.S
.endif
.include <bsd.own.mk>
.undef DESTDIR
.include <bsd.lib.mk>
lib${LIB}.o:: ${OBJS:O}
@echo building standard ${LIB} library
@rm -f lib${LIB}.o
@${LD} -r -o lib${LIB}.o `lorder ${OBJS} | tsort`
@echo done
It adds all .S
and .c
files to our library, that is finally built at
the lib${LIB}.o
target. Its transformation to libi386.a
is done by
the build framework again (that time with .include <bsd.lib.mk>
.
For GDT possible configurations, see [INTEL2015], Vol 3A, p. 3-2.
src/sys/arch/i386/include/segments.h
has GDT stuff.
GDT segments are later configured in kernl (see the machdep.c
) file.
See:
https://www.freebsd.org/doc/en_US.ISO8859-1/books/arch-handbook/book.html
Fourth program: boot1.c
Compiling bootxx
http://nxr.netbsd.org/xref/src/sys/arch/i386/stand/boot/
TODO: read the following link:
http://www.freenix.no/arkiv/daemonnews/200009/sb.html
Interesting mail-list entries about booting in i386
https://mail-index.netbsd.org/port-i386/2007/01/23/0008.html
https://mail-index.netbsd.org/port-i386/2004/10/15/0005.html
Filesystems Overview
The NetBSD whole partition
The NetBSD partition is the letter "c" in i386, amd64 and some other
architectures. I.e., if your drive is /dev/wd0
, the wd0d
refers to
the whole disk, while wd0c
refers to the NetBSD partition, only, that can
exist along with other operating systems partitions. In some other
architectures, though, this is not possible, so wd0c
refers to both the
NetBSD partition and the whole disk.
We see, from the "Booting" section that the very first 512 bytes of the disk
are reserved for the MBR. The first boot program (mbr.S
) finds an active
partition and load the first sector of it. The first sector (4096 bytes) of
the active partition is then reserved for the boot process.
The NetBSD partition:
Offset | Size | Description |
---|---|---|
0 | 4 kb | Boot block for the partition, loaded from the MBR |