Rangi!

:: rangi, racket, cli

The friendly neighbourhood volcano, and embarking upon a CLR executable metadata jungle expedition

Rangi Roots

Carrying on from the last post, let’s look at reading the data from a Windows .exe or .dll file that has been compiled by a CLI tool (ilasm, F#, C#, etc).

The intention of this reader portion is that it will eventually be called at phase level 1 in the macro expander, so it had better be damned fast. For this reason I will be restricting myself to only #lang racket/base, a function or two from racket/string and racket/port, generally preferring define over let and NO MACROS!

Yes it will be hard to resist.

The basic idea is to parse the Portable Executable format, and find the CLI header extension that will point us to the CLI metadata. We only care about a few other bits.

1
2
(define all-bytes (port->bytes (open-input-file filename)))
(define port (open-input-bytes all-bytes))

We don’t want to have file system handles hanging around, so we’ll gobble up all the bytes. Then, we can treat the resulting byte string as a port for our parsing adventures.

If you look at the Portable Executable spec you’ll see the file always starts with the old ms dos header stub. We can skip over this completely. Since we’ll need to parse a bunch of different areas of mostly boring bytes, we can define some specifications that will read n bytes and turn them into a hash

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
(define all-bytes (port->bytes (open-input-file filename)))
(define port (open-input-bytes all-bytes))

(define ms-dos-header-spec
  '((header 128)))

(define pe-file-header-spec
  '(
    (sig 4)
    (machine 2)
    (num-secs 2)
    (stamp 4)
    (sym 4)
    (num-sym 4)
    (opt-header-size 2)
    (characteristics 2)))

(define pe-opt-header-spec
  '(
    ; standard fields
    (magic 2)
    (lmajor 1)
    ; mooar fields 
    (image-base 4)
    ; ...
    ; nt specific fields
    ; ...
    ; data directories
    ; ...
    (cli-header-table-rva 4)
    (cli-header-table-size 4))
    ; ...
    )

(define (read-spec spec port)
  (make-hash
   (map (λ (x) (let ([id (car x)]
                     [size (cadr x)])
                 (cons id (read-bytes size port)))) spec)))

the real version won’t create hash maps that aren’t needed. This is a handy setup to extract, verify and debug what’s going on alongside a hex editor. The sequence of ms-dos, pe and optional pe headers can be expressed as a spec of specs.

1
2
3
4
5
 (define file-spec
  `(
    (ms-dos ,ms-dos-header-spec)
    (pe ,pe-file-header-spec)
    (pe-opt ,pe-opt-header-spec)))    

which can be processed in a suspiciously similar looking function

1
2
3
4
5
(define pe-headers
     (make-hash
      (map (λ (x) (let ([id (car x)]
                        [spec (cadr x)])
                    (cons id (read-spec spec port)))) file-spec)))

now we have access to cli-header-table-rva and cli-header-table-size. But what is RVA I hear you cry! It’s the relative virtual address. We’ll come back to that in a moment.

Sections

Next up there will be a number of section headers. These describe the different areas of the program that hold executable code, data, resources etc. The next 4 bytes hold the number of headers to expect. Up until now we haven’t used any of the data, it’s just in byte strings. PE files are little-endian encoded, and we need a way to take 4 bytes and turn it into a 32 bit integer. Racket handily does this for us

1
2
3
4
; unsigned little-endian conversion
(define (to-int bytes) (integer-bytes->integer bytes #f #f))
(define num-secs
  (to-int (hash-ref (hash-ref pe-headers 'pe) 'num-secs)))

Then follows the headers themselves, of course we have a spec for that

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
(define pe-section-header-spec
  '(
   (name 8)
   (virt-size 4)
   (virt-address 4)
   (size-raw 4)
   (ptr-raw 4)
   (ptr-reloc 4)
   (ptr-linenum 4)
   (num-reloc 2)
   (num-linenum 2)
   (characteristics 4)))

(define num-secs
  (to-int (hash-ref (hash-ref pe-headers 'pe) 'num-secs)))

(define section-headers
  (for/list ([i (in-range 0 num-secs)]) (read-spec pe-section-header-spec port))) 
   

Now we have a nice list of the section meta. The parts we care about are virt-size, virt-address and ptr-raw.

When the OS loads the process into a new virtual memory space, it will place the program beginning at image base address as found in the pe optional header above (or maybe not due to ALSR but we can ignore that.) The sections are laid out as per virt-address. The ptr-raw is the location the section appears at in the physical file.

That’s all very nice but we need to know where the CLI metadata exists in the physical file so we can parse it. We have enough information to work it out, though. Given an rva we need to first find which section it is in by comparing it in order to the location and size of each virtual section. If we then subtract the section’s virtual address from the rva we will be left with an offset that we can apply to the physical location of the section.

1
2
3
4
(define (calc-file-pos rva)
  (for/first ([header section-headers]
              #:when (< rva (+ (to-int (hash-ref header 'virt-address)) (to-int (hash-ref header 'virt-size)))))
    (+ (to-int (hash-ref header 'ptr-raw)) (- rva (to-int (hash-ref header 'virt-address))))))

We’ll be passing this function around as it used quite a bit.

CLI

We finally made it to the CLI header. Rejoice! Naturally, we’ll parse it with another spec

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(define cli-header-spec
  '(
    (size 4)
    (major 2)
    (minor 2)
    (metadata-rva 4)
    (metadata-size 4)
    (flags 4)
    (entry-token 4)
    (res-rva 4)
    (res-size 4)
    (strong-sig 8)
    (codeman 8)
    (vtable-fix 8)
    (export-jumps 8)
    (managed-header 8)
    ))

(define cli-header
  (begin
    ; jump in stream
    (file-position port (calc-file-pos (to-int (hash-ref (hash-ref pe-headers 'pe-opt) 'cli-header-table-rva))))
    (read-spec cli-header-spec port)))

Most of this stuff we don’t care about, but there are some tasty looking morsels in metadata-rva and metadata-size. This is the metadata root which is an important location that a bunch of forthcoming data are offset from.

1
(define md-root (calc-file-pos (to-int (hash-ref cli-header 'metadata-rva))))   

And here is where we say goodbye to our old friend read-spec. The way ahead is fraught with great peril which it is ill-equipped to deal with. We’ll leave the expedition on that cliffhanger, you’ll have to wait until the exciting next episode of adventures into the CLI metadata jungle.

I know what you’re thinking - what’s all this about volcanos? The answer is Rangitoto my friendly neighbourhood volcano. Because what do you name things?