[SAFE-ID: JIWO-2024-3230] 作者: 大猪 发表于: [2022-12-18]
本文共  位读者顶过
This post describes the work we’ve done on fuzzing the Windows RDP client and server, the challenges of doing so, and some of the results.
The Remote Desktop Protocol (RDP) by Microsoft continues to receive attention from the security community. From several critical vulnerabilities discovered in 2019 that had the potential to compromise millions of internet-facing servers, to RDP being used as one of the main initial access vectors by attackers. These risks have been further amplified by pandemic driven work from home.
Our initial interest for this project was VM related. Because the default way of connecting to an Azure Windows machine or a Hyper-V virtual machine is RDP, we decided RDP was the perfect target.
Like most successful projects, ours too began with some intense Googling, and we quickly stumbled upon a great BlackHat Europe 2019 talk on RDP fuzzing by Park, Jang, Kim, and Lee. The speakers found a couple of vulnerabilities in a matter of just a few hours using a not-so-fast fuzzer, so we decided to take on the challenge to build on top of their work, expand the fuzzing capability to other channels, improve its performance and find our very own RDP Remote Code Execution (RCE).
Unfortunately (or fortunately, depending on your point of view), not all fuzzing projects result in critical vulnerabilities (see Survivorship Bias). We were unable to find our RDP RCE (yet), but we did manage to find a handful of bugs, and gain a better understanding of the protocol, its components, and the fuzzing process and tools. The fuzzing infrastructure we created is generic enough to be helpful in fuzzing other targets.
In this post, we’re going to share our process of doing all that’s mentioned above. First, we’ll provide an overview of RDP and the fuzzing setup we created, then we’ll share the challenges we faced and how we dealt with them, and finally, we’ll review a couple of bugs we uncovered through this process.
The Remote Desktop Protocol is a popular protocol for remote access Windows machines. Malwaretech recently described it as a “protocol for protocols”. RDP allows multiple channels to operate in every connection, and each one of them has a different purpose. This means that every channel has its own code that handles its data, its own structure definitions, and data flows. This essentially means that there are, indeed, multiple protocols within RDP.
RDP channels can be either static or dynamic, but the distinction between the two is not crucial for the purpose of this article. If you’d like to know more about the inner workings of an RDP connection, we recommend reading our previous post on it. In case you want to learn more about the fuzzing process, you came to the right place.
Challenges in RDP Fuzzing
In a “standard” fuzzing scenario, you have a program that reads input controlled by the fuzzer, which could be a file or a stream of data of any sort. The program then handles the data while the fuzzer monitors the code coverage the data generated. Based on that coverage, the fuzzer mutates the input, sends the mutated input to the program again, and the process repeats.
RDP fuzzing is different since we must have an RDP connection active at all times. The data the fuzzer can input to the program needs to be sent as a Protocol Data Unit (PDU) on top of a specific channel (which should also be active during the fuzzing time) in an open connection. As mentioned before, each channel is its own protocol, so the fuzzing needs to happen on a channel-by-channel basis. This introduces the following challenges to the fuzzing process (which may also apply to other protocol/networking-related fuzzers):
These four challenges were the main ones we anticipated coming into this project. In the next section, we’ll discuss how we overcame these challenges, as well as additional ones that arose at a later stage of the work.
Technical Details: The Challenges and How to Overcome Them
In this section, we’ll go over the implementation details, and the challenges we encountered during the work process. We’ll also explain how we tackled them in order to get a working fuzzing setup. We have created a project repo on GitHub that contains all the code we wrote during this project. If you’re interested in this area, you may find this helpful when getting started.
In this project, we wanted to fuzz Windows’ RDP server as well as its RDP client. The motivation for fuzzing the RDP server is obvious: an attacker could use it to compromise a Windows server remotely and gain access to it. The motivation for fuzzing the RDP client is different. Think of a scenario in which an attacker has already compromised an RDP server, they then prepare their exploit and wait until a victim RDP client connects to it. Once the victim is connected, the attacker can compromise the victim’s machine as well through the RDP client. This can happen when an administrator connects to a server they manage, and this can even be used as a VM escape due to the fact that Hyper-V utilizes RDP to access its virtual machines (using the “enhanced session” feature).
A fuzzer for RDP needs to have the following basic components:
As you may imagine, the fuzzing setup for the client and the server ought to be quite different, but there are some similarities. We should start with those.
On the target side we have the following:
See next section for the functionality added to WinAFL and DynamoRIO in our custom builds
And on the input sender side we have one component:
The idea of the setup is quite simple: we try to send our fuzzed inputs on a “live” RDP connection, mocking (virtually) nothing.
To do so, we decoupled the generation of inputs from their transmission to the target, allowing the inputs to move from one side of the RDP connection to the other before being processed by the target. Additionally, we used WinAFL’s in-app instrumentation mode to not interrupt the normal execution flow.
In order for coverage-guided fuzzing to work, there must be a one-to-one correspondence between inputs and the code-paths they triggered. To achieve that, we developed “background fuzzing”, which is differentiating the fuzzer PDUs from the regular PDUs, and only tracking code paths for the former. This was essential because we only want the fuzzer to track the coverage of our own test cases rather than random PDUs that are sent over the connection.
To illustrate that, let’s see how something like that might look like when fuzzing the RDPSND virtual channel that redirects audio from the RDP server to the client. According to the docs, the first byte of every PDU represents the type of message sent.
Supported values for the msgType field are 0x01 through 0x0D. In this case we can use the most significant bit of the first byte as our fuzzing marker in the following manner:
After seeing the similarities between the client setup and the server setup, let’s look at how they differ, starting with the client.
Client Fuzzing Setup
The Windows RDP client is mstsc.exe, but most of the logic that handles virtual channel data is in mstscax.dll which the client loads.
Note that the remote access client of Hyper-V virtual machines, vmconnect.exe, also uses mstscax.dll for its core functionality
For simplicity and efficiency, we executed both client and server (the target and the agent) on the same machine (i.e., using the client to connect to localhost/127.0.0). In order to allow parallel fuzzing, we also used mimikatz to patch the server so that it allows concurrent RDP connections.
Using those components, we were able to achieve a modest execution speed of ~50-100 executions per second.This is by no means fast, but it was faster than the speed shown in the aforementioned Park et al research, so we were OK with that.
Server Fuzzing Setup
In order to find the target binary that holds the main logic of the RDP server, we can simply look into the Remote Desktop Services service.
The main logic of the RDP server is indeed in termsrv.dll, which is loaded into a svchost.exe process with the following command line: C:\Windows\System32\svchost.exe -k NetworkService -s TermService.
The original plan for fuzzing the server was similar to the client’s, that is to fuzz a few instances of the target in parallel on the same machine which requires running multiple instances of TermService. This turned out to be quite a challenging task since Windows does not support it by default. When we tried to do that manually, we even saw a few hardcoded strings within termsrv.dll that point to TermService and its registry keys, so we decided to focus our efforts elsewhere and just use multiple VMs to fuzz the server in parallel.
In the client fuzzing setup, we used the server-side API call WTSVirtualChannelWrite() to send the fuzzer inputs to the target. Unfortunately, we couldn’t find a similar API that would let us send inputs to the server in an RDP connection. Hence, we chose to use a custom-built FreeRDP (a popular open-source RDP client) from an Ubuntu machine to send inputs to the fuzzed server. Note that this is not an ideal setup for fuzzing and these constraints resulted in the fuzzing speed of the server fuzzing being roughly 1/10 of that of the client fuzzing.
These are the components of the setup when fuzzing the server:
When running the first channel (fuzzing the client) we’ve encountered a problem: as soon as two invalid messages were detected on the client-side, it would terminate the connection immediately.
To avoid this issue, we introduced some protocol grammar enforcement in the agent’s logic, i.e., limiting the space of allowed inputs. Among other things, we implemented:
We extracted some of the RDP grammar from Microsoft’s documentation of RDP and its extensions, some from reverse engineering the relevant binaries, and the rest from tracing failed target executions.
For example, this logic can be used to only allow messages that begin with one of the supported msgTypes, followed by the size of the PDU and a unique identifier, and whose total size is between 22 and 122 and leaves a remainder of 2 modulo 4.
It’s important to mention that by doing these enforcements, you practically limit the ability of the mutation engine to change the test cases as it pleases, thus potentially missing interesting mutations. For that reason, we tried to enforce as little as we could, while still ensuring the connection will not get closed every too often.
Another important point here is that those grammar enforcements are not the only option when dealing with these kinds of issues. In the case of one specific channel (GFX), we turned to patch the actual target function we were fuzzing so that it won’t close the connection in case of an invalid set of messages. This allowed us to continue fuzzing invalid messages and keep the connection open at all times. Here also, you are at risk of finding crashes that will not reproduce in the original code (without the patch). This is a great example of the delicate balance fuzzing requires between making sure you get sufficient execution speed while still maintaining the original functionality of the target program, as well as the freedom of the mutation engine.
It stands to reason that most logic, and hence most bugs, depend on a sequence of messages rather than a single one. It was not a coincidence that a majority of the bugs we found involved at least two messages.
To uncover these bugs, we introduced multi-input fuzzing. We used a fuzzer dictionary that identifies the start of a new message and its type. The agent would then split the input into multiple PDUs according to those dictionary words, and send them one after the other.
So, a multi-input input might look something like this:
which the agent translates to three messages with msgType 7, 2, and 3, and their respective content.
To maintain the one-to-one correspondence between the inputs the fuzzer created and the code coverage they triggered, we introduced a second marker that identified the last message in a sequence. Only when WinAFL identifies that a call with this “last in sequence” marker ends it completes the cycle and creates the next input.
While multi-input fuzzing was crucial (and fruitful) to our efforts, we also found it necessary to limit the number of PDUs for each test case. That is because the fuzzer is drawn to inputs that lead to different code sequence. Repeating the same message 100 times leads to a different code sequence than sending it once.
After about one week of fuzzing, the first crash appeared. However, the crash didn’t reproduce when we tried running the same input again. This happened quite often, and it was likely due to the stateful nature of the protocol. In other words, one test case got the client to a specific state, which was then “utilized” by a subsequent test case to crash the target.
To understand unreproducible crashes, we modified WinAFL to create a memory dump of the target process whenever a crash is detected.
Crash Analysis Automation
Creating dumps on crashes solved one problem, but created another: once a crash is found, it is very likely to be encountered repeatedly. Generally, WinAFL tries to detect identical crashes and notify only on “unique crashes”, but our multiple message fuzzing made this detection very hard. Think about a case where a single message causes the target to crash. The fuzzer can create any set of messages with this one at the end. Each of these message sets will crash the target, and will also result in a different coverage bitmap, since the messages and their handling are different (except for the last one that actually matters). This will cause WinAFL to report a unique crash every time.
We had to automate the analysis of the crashes for two reasons. First, it is tedious work to analyze each crash manually (only to find that it is old news), and second, the disk was quickly filled up with memory dumps.
To overcome this problem, we wrote a WinDBG script that analyzes a crash and extracts the crashing stack from it. We then ran a PowerShell script that periodically analyzes the crashes and keeps only those that contain new stacks (and emails us the good news).
Long Startup Time
In the client fuzzing setup, from the moment the target (mstsc.exe) was created by the fuzzer, to the moment the connection was established and the first message could be sent, it took more than 10 seconds. Hence, it was crucial to perform as many iterations as possible without restarting the target. We achieved this by using the -fuzz_iterations parameter of AFL-Fuzz and supplying as much iterations as possible (before things start to break).
Like multi-input fuzzing, some of the logic requires a sequence of messages on different channels. For example, sending camera data from the client to the server is supported using multiple channels, as explained in the docs.
Thus, if we wish to fuzz the server by sending camera inputs, we must do so on at least two different channels.
Our solution was also similar — the fuzzer dictionary also determined the channel on which the message was to be sent.
Locating relevant code
Since RDP has many different components in Windows, it can be challenging to even locate the target function we need to fuzz.
To do that quickly, we created a small database of all the symbols that might be related to our project.
The idea was to download all the PDBs that are relevant to our version of Windows, extract all function names from them and dump them into a file (linking back to the exe/sys/dll) so that we can quickly search for function names and locate the function related to our current target channel.
This helped a lot. Since almost all of the dynamic channel receive functions match the following pattern C<class-name>::OnDataReceived, we could quickly look at the list of those functions and figure out what is probably related to the channel we are targeting.
Bug Case Studies
In this section, we’ll share technical details of two of the bugs we found during this project.
AUDIO_PLAYBACK Channel (Server → Client)
The AUDIO_PLAYBACK_DVC virtual channel is used to play sounds from the server on the client. Its normal flow consists of two sequences: initialization and data transfer. In normal usage of the protocol, the initialization sequence occurs once at the beginning, followed by many data transfer sequences.
The Wave and WaveInfo PDUs contain an index to the format array exchanged in the initialization sequence that determines the format of the transmitted audio data.
When a format change occurs — i.e., a Wave or WaveInfo PDU arrives with an index different than the last one used — the client verifies that the new index is valid.
However, as long as the format index remains the same, this verification is skipped (OnNewFormat()) is not called and the verification code is in it). Here is a pseudo code of the relevant parts.
The vulnerable flow that triggers this bug in the client is as follows:
This allows an attacker to cause a reallocation of the format array using an additional Server Audio Formats PDU and then specify an invalid index that was previously valid and used, causing the client to read out of bounds of the formats array and crash.
Note that this bug relies heavily on multi-input fuzzing, and we could not have found it without this feature.
AUDIO_INPUT Channel (Client → Server)
The AUDIO_INPUT virtual channel is used to send sound input from the client to the server. On the server side, the audio input data is handled by the elevated audiodg.exe process.
As in the AUDIO_PLAYBACK_DVC channel, the client and server first exchange an array of sound formats they support.
The Sound Formats PDU begins with a header of nine bytes, which includes the command, number of formats, and size of the packet, followed by an array of formats, each of variable length (plus an optional field of extra data).
The code that handles the Sound Formats PDU is in rdpendp.dll. It first verifies that the packet size is at least nine bytes, and then reads the header and verifies that the size from the header is not larger than the size of the packet.
The same function then subtracts nine from the size it read from the header and reads the number of formats specified in the header, as long as the remaining length is sufficiently large.
The size from the header is not protected against an integer underflow, which may cause this subtraction to wrap around and will result in the program reading “formats” from after the end of the packet.
Note that we have no control over the initialization of the audiodg process. Thus, we could not have found this bug without the DynamoRIO attach functionality.
In this blogpost we presented our process of trying to tackle a challenging fuzzing target: Windows’ RDP client and server. We wanted to share our process for a few reasons.
First, we believe it’s important to share the process even if you were unable to get to your original goal (an RCE for example). This can help you reflect on your own process — what seemed to work well and what could have been improved. This can also help the security community learn from past experiences.
Second, even though setting up a fuzzing environment may be a complex process, we think it’s a goal worth pursuing — even with more challenging targets like the one we presented here. RDP is a very complex protocol with many components and different code bases. Combining this with the fact that it is so popular (over 4 million internet-facing servers based on Shodan.io) makes it a very lucrative target for attackers. This means that as the security community, we need to make great efforts to make it more secure.
Even though Microsoft has been doing some great work securing their products in recent years, we believe that there are still more vulnerabilities in RDP components waiting to be fuzzed. With the understanding, we gained during this process and with the information shared here, we can drive future research forward. Examples for projects worth pursuing may be improving the fuzzing performance, expanding it to other channels we didn’t have the chance to fuzz, using an emulator-based fuzzer or even performing manual analysis on interesting parts of the code.
Finally, to show the generality of our fuzzing solution, we have since applied it to several RPC servers with promising preliminary results. Expect to see more on that in the future.