I was re-reading some articles I had saved from a while back, mostly on stuff regarding the internal workings of ntdll and kernel32 and so on. One of them was Alex Ionescu’s Closing “Heaven’s Gate”, detailing the new Windows 10 exploit mitigation feature CFG and how it prevents one from loading a 64 bit DLL in a WoW64 process. The article ended with “Reopening the gate is left as an exercise to the reader 😉”, and I couldn’t find any existing solution posted for this (which is surprising considering the triviality of the solution) so I thought why not take a crack at it?

Short note before we get on with reopening Heaven’s Gate, all DLLs referred to are the 64 bit version, unless otherwise stated.

The problems we have to deal with

Based on the 2 main articles that discussed DLL loading through Heaven’s Gate, the first being George Nicolaou’s Knockin’ on Heaven’s Gate and the second being the aforementioned article, we have several problems. The first is getting to 64 bit and calling functions, which has already been done for us many times by many people. For this exercise I will be using ReWolf’s library.

The big problem starts after that with attempting to load kernel32. The first hurdle we have is defeating CFG and allow LdrpCallInitRoutine to call DllMain and bypassing the indirect call check.

The next problem we have is the addressing of kernel32. Nicolaou states that simply loading kernel32 at a different address is not enough, “since certain functions contained within ntdll require numerous structures from the library that are referenced using their absolute address. In addition the kernel32 library’s initialization function KernelBaseDllInitialize would fail to execute and raise an unhandled exception in the process”. We cannot deal with this problem simply by unmapping the VAD that blocks the x64 address space as they are now configured as NoChange and OneSecured.

The solution

There are a few possible solutions to the first problem of CFG, ranging from trivial to untrivial. To figure them out, let us review how CFG functions.

Per TrendMicro, files compiled with CFG enabled (in this case ntdll and kernel32) will call the function pointed to by __guard_dispatch_icall_fptr before an indirect call is made. At runtime, this would point to LdrpValidateUserCallTarget. In case the address is not valid, it will call LdrpHandleInvalidUserCallTarget which in turn calls RtlpHandleInvalidUserCallTarget.

Things are slightly different in ntdll, possibly due to the fact that TrendMicro’s article was based on the 32 bit ntdll whereas we care only about the 64 bit ntdll: __guard_dispatch_icall_fptr is not overwritten with LdrpValidateUserCallTarget but instead LdrpDispatchUserCallTarget, which takes 1 argument in RAX that represents the icall target.

LdrpDispatchUserCallTarget checks the target address’s validity, if it is valid then the call is dispatched. If that fails, it will instead jump to LdrpHandleInvalidUserCallTarget.

LdrpHandleInvalidUserCallTarget then tries to call RtlpHandleInvalidUserCallTarget, and if the call succeeds it jumps to the target in RAX.

RtlpHandleInvalidUserCallTarget will perform several checks:

if ( RtlGuardAllowSuppressedCalls && (unsigned __int8)RtlpGuardIsSuppressedAddress() )
  {
    RtlpGuardGrantSuppressedCallAccess();
  }
  else if ( !(unsigned int)LdrControlFlowGuardEnforcedWithExportSuppression()
         || !(unsigned __int8)RtlGuardIsExportSuppressedAddress(v1)
         || (signed int)RtlpUnsuppressForwardReferencingCallTarget(v1) < 0 )
  {
    RtlFailFast2(10i64, v1);
  }

RtlpGuardGrantSuppressedCallAccess? Could we not use that (or ZwSetInformationVirtualMemory, which underlies it) to make whatever memory we want a valid icall target? Not quite. There are 2 main problems with this:

  1. We wouldn’t be able to call this between kernel32’s initial loading and LdrpCallInitRoutine checking the validity of the icall address. Well, we could by hooking, but that is an overly complicated solution.
  2. If we somehow manage to do so anyways, ZwSetInformationVirtualMemory would call MiCommitVadCfgBits to set the address to valid. However, MiCommitVadCfgBits does a nasty trick as Ionescu points out: if the address is in the 32 bit address space it will not mark it on the 64 bit bitmap but rather the 32 bit bitmap. This is why the entrypoint of kernel32 is not marked in the first place even though we loaded it with LdrLoadDll: the wrong bitmap is set.

The other solution at this level then would be failing the later checks so that RtlFailFast2 is never called. However, tampering with 3 separate functions is too much work. Thus, we pick the most trivial solution: overwriting ntdll!LdrpValidateUserCallTarget with a simple jmp rax. Since ntdll’s NtProtectVirtualMemory function does not perform any icalls, we can easily use ReWolf’s existing library to patch LdrpDispatchUserCallTarget and LdrpDispatchUserCallTarget (the later being likely unnecessary but it is better to err on the safe side).

DWORD64 pLdrpValidateUserCallTarget = GetProcAddress64(hNtdll64, "LdrpDispatchUserCallTarget");
	DWORD old_protect;
VirtualProtectEx64(INVALID_HANDLE_VALUE, pLdrpValidateUserCallTarget, 1, PAGE_EXECUTE_READWRITE, &old_protect);
	DWORD ret_instruction = 0xc3;
	X64Call(pmemcpy_s, 4, (DWORD64)pLdrpValidateUserCallTarget,
		(DWORD64)1,
		(DWORD64)&ret_instruction,
		(DWORD64)1);
VirtualProtectEx64(INVALID_HANDLE_VALUE, pLdrpValidateUserCallTarget, 1, old_protect, &old_protect);DWORD64 pLdrpDispatchUserCallTarget = GetProcAddress64(hNtdll64, "LdrpDispatchUserCallTarget");
VirtualProtectEx64(INVALID_HANDLE_VALUE, pLdrpDispatchUserCallTarget, 1, PAGE_EXECUTE_READWRITE, &old_protect);
char jmprax[] = { 0xff, 0xe0 };
X64Call(pmemcpy_s, 4, (DWORD64)pLdrpDispatchUserCallTarget,
		(DWORD64)2,
		(DWORD64)&jmprax,
		(DWORD64)2);
VirtualProtectEx64(INVALID_HANDLE_VALUE, pLdrpDispatchUserCallTarget, 1, old_protect, &old_protect);

Voila, our problem with CFG is thrown out of the window.

What about the address base problem? Well, it appears to no longer apply on Windows 10 for unknown reason. All three important DLLs, kernel32, kernelbase and ntdll has the same default address (0x0000000180000000) and are all truly relocatable.

However, should the DLL’s ImageBase be different and kernel32 being required to load at the same address (which should pretty much never happen due to ASLR), a naive (and possibly very unsafe) solution would be to disassemble all of the instructions in ntdll’s .text section and relocating the pointer to the appropriate address. But since it is unlikely that Microsoft will roll back ASLR compatibility for kernel32/kernelbase any time soon, we should not have to worry about this issue.

With all of our problems resolved, we can load kernel32 successfully.

To verify that the loaded DLL function as expected, we call kernel32!CreateProcessW launch a simple program (in this case the program is a FASM hello world).

We could also now load a shellcode that would in turn map a 64 bit binary that is compiled with CFG enabled without any problems.

Some issues remain

This technique is not perfect and several issues remain that need to be addressed.

  1. It doesn’t work under Visual Studio’s debugger. It is unknown why this issue exists (I do not intend to debug a debugger any time soon), but LdrLoadDll will always return 0xc0000142/STATUS_DLL_INIT_FAILED.
  2. Loading Kernel32 into a console process will result in another console window being initialized. Not a major problem, but could be very annoying.
  3. The solution might not be 100% safe. If a different ntdll version uses a different register for passing the address/jumping, things will fail.
  4. It will not work and will crash on non-Windows 10 machines, so perhaps a more comprehensive library could be made combining both this an Nicolaou’s unmapping method.
  5. While CreateProcessW works, some functions still do not function for now. A quick test with user32!MessageBoxW resulted in a crash.

The cause of the crash is unknown, but it is unlikely that anyone would specifically want to create an user interface through Heaven’s Gate anyways, and if they do (and succeed) it would be great to see the solution.

View the full code on Gitlab.

Edit

I attempted to run the code without the patch and somehow, it still works! Based on Ionescu’s article, this was not supposed to be the case, as MiSelectCfgBitMap should not allow this to happen!

cfg

Any private memory allocations below the 64-bit boundary will be marked only in the 32-bit bitmap, while the opposite applies to the 64-bit bitmap […] in a CFG-aware NTDLL.DLL, is that LdrpCallInitRoutines will perform a CFG bitmap check before calling the DllMain of this DLL. As the DLL will be loaded in 32-bit address space, the WoW64 CFG bitmap will be marked, and not the Native CFG bitmap — causing the 64-bit NTDLL to believe that DllMain is not a valid relative call target, and crash the process.

From Ionescu’s post

It turns out that we both missed one of the checks in there, MiSelectBitMapForImage. As it would turn out, the function would return 3 if the image is 64 bit, regardless of where it was loaded

This would in turn cause MiSelectCfgBitMap to select the correct 64 bit bitmap.

As a result, loading DLLs through LdrLoadDll would still work, but the manual mapping of DLLs that have CFG enabled that would fail. Because of this, this code would primarily be useful for doing that and allowing manual mappers for 64 bit libraries to run inside 32 bit processes (provided they are position independent and do not import from anything other than ntdll). So this patching code isn’t necessary at all for calling LdrLoadDll, but required for loading DLLs that were not dropped to disk.