The Common Object File Format

Discussing the intricacies of the COFF file format.

The Common Object File Format (COFF) was originally created by AT&T for a major version of the Unix operating system, Unix System V Release 3, in 1983. Since then, it's been adopted and modified for use in many modern operating systems, including Windows.

The COFF format used on Windows today isn't the same as the original, but most of its components remain the same, including:

The File header (sometimes called the COFF header)
The Optional header (although listed in Microsoft's COFF specification as being an optional part of the format, you'll essentially never see this present within any Windows COFF file and is usually found within a PE file instead. We'll be skipping this one, as it isn't relevant.)
The section headers
The sections themselves
The Symbol Table
The Strings Table
Relocation entries

The File Header

These components each have their distinct purpose, and we'll discuss them all thoroughly, starting with the file header. If you're at all familiar with the Portable Executable (PE) file format on Windows used for executables, then you should know this one well, as it's the same header found alongside the Optional Header, which itself is grouped inside of the NT headers.

typedef struct _IMAGE_FILE_HEADER {
    WORD    Machine;
    WORD    NumberOfSections;
    DWORD   TimeDateStamp;
    DWORD   PointerToSymbolTable;
    DWORD   NumberOfSymbols;
    WORD    SizeOfOptionalHeader;
    WORD    Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;

On Windows, it looks something like this, and contains an abundance of useful metadata for the file, including some characteristics, the number of sections present, as well as an offset to the symbol table.

Sections and Section Headers

#define IMAGE_SIZEOF_SHORT_NAME              8

typedef struct _IMAGE_SECTION_HEADER {
    BYTE    Name[IMAGE_SIZEOF_SHORT_NAME];
    union {
            DWORD   PhysicalAddress;
            DWORD   VirtualSize;
    } Misc;
    DWORD   VirtualAddress;
    DWORD   SizeOfRawData;
    DWORD   PointerToRawData;
    DWORD   PointerToRelocations;
    DWORD   PointerToLinenumbers;
    WORD    NumberOfRelocations;
    WORD    NumberOfLinenumbers;
    DWORD   Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

Again, if you're familiar with the PE format for Windows executables you should immediately recognize this one, as it's identical to the ones found there. There is typically a singular section header for each section within the object file, and they each contain metadata about the section, including the size of the section, characteristics, the RVA to the section it references, and PointerToRelocations, a crucial member that we'll be using later. These section headers exist directly after the file header, and they're all in line with one another, preceding the sections themselves.

The sections themselves, sometimes referred to as image or COFF sections, come immediately after the section headers. This includes things like .text for executable sections, .data for modifiable program data, .pdata for info used during stack unwinding, and others.

The Symbol And Strings Tables

the symbol table follows the COFF sections. It contains data on different symbols used within the object file, including functions and variables. The symbol table is essentially an array of IMAGE_SYMBOL structures, as shown here.

typedef struct _IMAGE_SYMBOL {
    union {
        BYTE    ShortName[8];
        struct {
            DWORD   Short;     // if 0, use LongName
            DWORD   Long;      // offset into string table
        } Name;
        DWORD   LongName[2];   
    } N;
    DWORD   Value;
    SHORT   SectionNumber;
    WORD    Type;
    BYTE    StorageClass;
    BYTE    NumberOfAuxSymbols;
} IMAGE_SYMBOL;

The most important members to take note of are SectionNumber, which indicates which COFF section this symbol belongs to, Value, which is an offset that can be applied to the base address of that section to reach the symbol, Type, which indicates the symbol's type (usually either variable, function, or imported function), and a confusing union, called N, which stores the symbol's name. The Name struct has two members, Short and Long. If Short is a non-null value, the symbol's name is stored within the ShortName member, an array that is 8 bytes long. For names that are longer than 8 bytes, however, you can access the name of the symbol by applying Name.Long, which is an offset, to the address directly following the symbol table. Here's some pseudocode to demonstrate this:

symbol_name = (symbol_table + symbol_table_size) + symbol.N.Name.Long;

This offset is accessing the strings table, which directly follows the symbol table. The strings table is exclusively used to store symbol names longer than 8 bytes (including the null terminator).

This is generally what it'll look like; a myriad of ANSI strings packed together at the end of the file. This strings table in particular can be found within TrustedSec's whoami BOF.

Relocation Table/Entries

There is one final part of the COFF format on Windows we haven't discussed, the relocation entries. Each COFF section has a related array of relocation entries, with each entry using the following structure:

typedef struct _IMAGE_RELOCATION {
    union {
        DWORD   VirtualAddress;
        DWORD   RelocCount;    
    } DUMMYUNIONNAME;
    DWORD   SymbolTableIndex;
    WORD    Type;
} IMAGE_RELOCATION;

Each relocation describes a value that needs to be readjusted for it to be valid in the context of where the file is loaded. The IMAGE_RELOCATION structure contains some important information, including VirtualAddress, an RVA (offset) that can be applied to the beginning of the section the relocation belongs to to reach the value that needs relocating, the SymbolTableIndex, which indicates which symbol this relocation is referring to, and the most important member, Type, which indicates what kind of relocation to perform. There are several kinds of relocations you'll need to handle in a custom COFF loader, and I won't go over all of them here, but I'll provide you with a few to give a bit of insight into what I'm referring to, so I don't have to go over this too much later.

#define IMAGE_REL_AMD64_ADDR64          0x0001  // 64-bit address (VA).

This one is fairly straightforward. It's a 64-bit relocation, which means you'll need to adjust 64 bits at the relocation's address. For each relocation, there are specific formulas that need to be used. For this one, we can do something like this:

uint64_t* needs_relocating  = (uint64_t*)relocation_address; 
*needs_relocating           = *needs_relocating + (uint64_t)section_base;

Here, we simply add the address of the base of the section that the relocation belongs to with the first 64 bits from the relocation.

Here's another common one you might encounter:

#define IMAGE_REL_AMD64_REL32           0x0004 // relative 32-bit reloc

This is a 32-bit relative relocation, indicating that we need to adjust 32 bits at the relocation's address. this is usually for a relative jump or call instruction. So for instance, a jump relative to the instruction pointer, such as jmp rip + 80h, would fall under this category. So now you have a broad idea of some of the relocations that need to take place when we parse the BOFs relocation entries.

PreviousThe Why NextLoading An Object File

Last updated 1 year ago