a finer look at elf
first and foremost… to manny pacquiao, congratulations! and mabuhay! godbless the philippines!!!!
the linux binary from scratch program really fascinates me to no end. and now, i’m obsessed! can i now account every last byte in my program like what brian raiter did here? well if i take a look at the assembly listing, i guess so. but this time, i’ll try a different approach. i’ll try going through the program at hexadecimal level, count my way to the last byte and hopefully, give some meaning to the gibberish on my screen. besides, this activity will help me familiarize myself with a tool called, the hex editor. i’ll be using app-editors/hteditor to browse through the file and sys-apps/hexdump to print it’s contents at the terminal.
but before actually jumping right in, i just want to verify something essential that will really be useful when looking at a hex dump of a file! it is called “endianness of a processor”. i’ve read some time ago that intel architectures are “little endian” machines. that is, the least significant part goes first instead of the normal “big endian” way.
perhaps an example would illustrate this much much more clearly.
; endian.asm
; ----------------------------------
; a simple endianness testing thingy
; how is 'deadbeef' (hex) stored?
dw 0xdead
dw 0xbeef
so how does the intel processor look at deadbeef? let’s check shall we?
amerei@heaven ~/workdir/elf_magic $ nasm -f bin endian.asm heaven ~/workdir/elf_magic $ hexdump -C endian 00000000 ad de ef be |....|well! what do we have here!? ad de ef be? if we inspect it closely, we would notice that ‘ad de ef eb’ is just an inverted (sort of) version of ‘de ad be ef’! that proves that my machine, (intel pentium three) is a little endian architecture. the most significant byte of the hexadecimal number dead is de and it’s least significant part is ed. and as what i’ve pointed out earlier, least significant parts come before most significant parts in little endian architectures. looks like people weren’t lying afterall about intel storing word values backward. i should have more faith next time.
now that i’ve verified my machine’s endianness, i suppose i’m now ready to take a peek inside the elf binary i made.
notes:
‘elfhead’ is the complete binary from scratch program and not just a stale binary file with an elf identification.
read the elf specification alongside with this post. (see also man 5 elf) and my former posts about “understanding elf
how big is the program anyway?
amerei@heaven ~/workdir/elf_magic $ wc -c elfhead 342 elfheadahh! so i have a total 342 bytes ahead of me to give meaning now. seems like a lot or work, but hey! no guts no glory.
page 1 inspecting the elf header
how big is the elf header? perhaps there are tons of ways to do this, but i can only think of three ways as of now:
1) with your basic math skills, refer to struct Elf32_Ehdr at /usr/include/elf.h and compute for the size of the structure or use the elf specification instead.
2) get the value at the byte offset of e_ehsize in the program itself. (fucking crazy!!)
3) use a program like readelf to do all work for you.
number three seems the most doable so using readelf, i determined the elf header to be 52 bytes. so let’s print the first 52 bytes of the program shall we?
amerei@heaven ~/workdir/elf_magic $ hexdump -C -n 52 elfhead | cat -n
1 00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
2 00000010 02 00 03 00 01 00 00 00 74 80 04 08 34 00 00 00 |........t...4...|
3 00000020 9f 00 00 00 00 00 00 00 34 00 20 00 02 00 28 00 |........4. ...(.|
4 00000030 04 00 03 00 |...|
before you is the entire 52 bytes that make a file, an elf file! how do we know? because it contains the magic 16 bytes at the very beginning of the file! (see line 1) and compare it with the output of readelf belowamerei@heaven ~/workdir/elf_magic $ readelf -h elfhead | grep -i magic Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00at this point, i realize that explaining the program byte by byte is soo uncool already and a complete waste of time. so i’ll skip ahead to line 4. which has the values 04 00 03 00 this line actually holds the values for e_shnum and e_shstrndx we can deduce from here that the program we are looking at has 4 section headers and that the string section is located at index 3 of our section table!
342 - 52 = 290 more bytes to go!
page 2 inspecting the program headers
after the elf header comes the (optional) program header. but thanks to readelf we can determine the number of program headers in an object file, plus, the size of each header (all program headers are of equal size).
amerei@heaven ~/workdir/elf_magic $ readelf -h elfhead | grep -i Program Start of program headers: 52 (bytes into file) Size of program headers: 32 (bytes) Number of program headers: 2
aha! so we have two program headers! each 32 bytes wide. consuming a total of 64 bytes inside our elf binary. we now know how to call hexdump so it will show us only the bytes of the program headers.
amerei@heaven ~/workdir/elf_magic $ hexdump -C -s 52 -n 64 elfhead | cat -n
1 00000034 01 00 00 00 00 00 00 00 00 80 04 08 00 80 04 08 |................|
2 00000044 8d 00 00 00 8d 00 00 00 05 00 00 00 00 10 00 00 |................|
3 00000054 01 00 00 00 90 00 00 00 90 90 04 08 90 90 04 08 |................|
4 00000064 12 00 00 00 12 00 00 00 06 00 00 00 00 10 00 00 |................|
as usual, going through the entire program header table byte by byte is uncool and pointless. i’ll skip and just interpret the last two double word values (06 00 00 00 00 10 00 00) of the second program header. it means that the second program header has a read/write attribute (0x06) and that it is aligned to a page boundary (0x1000 = 4 kilobytes or consequently, 4096 bytes)
you might be wondering, how come i declared two program headers? well, program headers are used to map regions of a program to a virtual memory address space. there are two program headers, the first maps the region where the .text segment resides giving it a read /execute attribute. and the second maps the region where the .data segment resides, and giving it a read/write attribute. executable memory must not be writable since it could give way to code injections and modifications like this.
52 + (32 * 2) = 115
342 - 115 = 227 more bytes to go!
page 3 inspecting the .text segment
the .text section is where the executable part of the program resides. we will study this section’s attributes more when we’ve reached the part about “section headers” for now, it’s enough to say that this section is inside the text segment that marks it as “read/execute”
for reference, this is the assembly listing inside the .text section again, and i’ve also placed the size of the instructions so we can determine the entire size of the .text section
mov eax, 4 ; 5 bytes
inc ebx ; 1 byte
mov ecx, stringy ; 5 bytes
mov edx, strlen ; 5 bytes
int 0x80 ; 2 bytes
xor eax, eax ; 2 bytes
inc eax ; 1 byte
xor ebx, ebx ; 2 bytes
int 0x80 ; 2 bytes
hmmm.. 25 bytes? what does hexdump have to say?amerei@heaven ~/workdir/elf_magic $ hexdump -s 116 -n 25 -C elfhead 00000074 b8 04 00 00 00 43 b9 8d 80 04 08 ba 12 00 00 00 |.....C..........| 00000084 cd 80 31 c0 40 31 db cd 80 |..1.@1...|now the fun part! let’s interpret those hexcodes for some true low level fun! remember, the byte order is little endian.
:: hex code :: | instruction ================================== b8 04 00 00 00 | mov eax, 4 43 | inc ebx b9 8d 80 04 08 | mov ecx, stringy ba 12 00 00 00 | mov edx, strlen cd 80 | int 80h 31 c0 | xor eax, eax 40 | inc eax 31 db | xor ebx, ebx cd 80 | int 0x80
116 + 25 = 141
342 - 141 = 201 more bytes to go!
page 4 inspecting the .data segment. my favorite part!!!
we’re now down to my precious data section where i declare a string followed by a newline. this is really basic so let’s just count the string length.
I LOVE STEPHANIE! plus the newline character is 18 bytes wide. now we know how to call hexdump.
amerei@heaven ~/workdir/elf_magic $ hexdump -s 141 -n 18 -C elfhead 0000008d 49 20 4c 4f 56 45 20 53 54 45 50 48 41 4e 49 45 |I LOVE STEPHANIE| 0000009d 21 0a |!.|we can see that the last byte is 0a and we all know that 0x0a = 10 = newline! we are correct! and we are still in the right track!
hang in there! 141 + 18 = 159
342 - 159 = 183 more bytes to go!
page 5 inspecting the section headers
we’ve reached the part of the file where we define section headers! they are like program headers in the sense that they give certain offsets a particular set of attributes, only, they are more concerned with “link view” routines rather than “load/execute view”.
with the now, invaluable readelf utility, we shall get more information about our program’s section headers.
amerei@heaven ~/workdir/elf_magic $ readelf -h elfhead | grep -i section Start of section headers: 159 (bytes into file) Size of section headers: 40 (bytes) Number of section headers: 4 Section header string table index: 3now it is confirmed that we are definitely on the right track! because our computed offset (141+18=159) is the same with the section header offset (159 bytes into file). each section header is 40 bytes wide! and since there are four (4) section headers, we now know that the total size of all section headers is 160 bytes! now we know how to call hexdump
amerei@heaven ~/workdir/elf_magic $ hexdump -s 159 -n 160 -C elfhead 0000009f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000000bf 00 00 00 00 00 00 00 00 0b 00 00 00 01 00 00 00 |................| 000000cf 06 00 00 00 74 80 04 08 74 00 00 00 8d 00 00 00 |....t...t.......| 000000df 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 |................| 000000ef 11 00 00 00 01 00 00 00 03 00 00 00 8d 80 04 08 |................| 000000ff 8d 00 00 00 12 00 00 00 00 00 00 00 00 00 00 00 |................| 0000010f 04 00 00 00 00 00 00 00 01 00 00 00 03 00 00 00 |................| 0000011f 00 00 00 00 00 00 00 00 3f 01 00 00 17 00 00 00 |........?.......| 0000012f 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|up to this point, we have only been interpreting our elf binary from scratch by interpreting bytes. it’s exhausting to do that, so for this activity, we shall use the readelf utility interpret those hexcodes for us.
amerei@heaven ~/workdir/elf_magic $ readelf -S elfhead There are 4 section headers, starting at offset 0x9f: Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .text PROGBITS 08048074 000074 00008d 00 AX 0 0 16 [ 2] .data PROGBITS 0804808d 00008d 000012 00 WA 0 0 4 [ 3] .shstrtab STRTAB 00000000 00013f 000017 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific)hmm, so we have a null section header at index 0, a text section header that defines the executable section of our program at index 1, a data section header that defines a read/write portion where we could declare and modify variables at index 2, and lastly a section header string table section where section names are stored at index 3. that’s makes four section headers total.
159 + 160 = 319
342 - 319 = 23 more bytes to go!
but what does section .shstrtab do? if we look closely, it has no address at our programs virtual memory space! it starts at offset 0x13f (319) in our program, is 0x17 (23) bytes wide, and doesn’t have any attribute like read or write? what’s up with this section anyway? it seems quite useless don’t you think? hmm, trivially, i suppose we can make do without any section header string table. why? coz this section just defines the part of the file where section name stings like .text . data are defined! these section string names must be separated and padded at both ends with a null character as what the elf specification points out. we’ll see if they’re really null padded soon.
again, let’s call hexdump and print the entire string table that the .shstrtab section defines
amerei@heaven ~/workdir/elf_magic $ hexdump -s 0x13f -n 23 -C elfhead 0000013f 00 2e 73 68 73 74 72 74 61 74 65 78 74 |..shstrtab..text| 0000014f 00 2e 64 61 74 61 00 |..data.|yes! the string table starts and ends with a null character (00). and still conforming to the elf standard, each string is also null separated!
319 + 23 = 342
342 - 342 = 0
wtf!? no more bytes left? you guessed it! i’ve reached the last byte! i made it! every last byte in my program has been accounted for! i know exactly what my program is made of now! no hidden addons and other whatnot that i don’t know about are included! just plain and simple, all the bytes that i declared! no more, no less.
my quest is complete! i hope everyone enjoyed reading this very long article as i’ve enjoyed writing it.

lost! so lost! im lost hahaha!
Comment by lyn — January 24, 2006 @ 2:10 am
which reminds me… of an axn show titled “lost”.. i don’t follow it that much though but i always dream about getting lost and stranded together with steph on an unexplored+unknown+person-less island! that will be sooo absolutely cool and romantic!!
but hey! i don’t mind getting lost and stranded with you on a small island too lyn! :p let’s get lost together now shall we? please?
Comment by sleepy jenkins — January 24, 2006 @ 4:40 am