标题 | 简介 | 类型 | 公开时间 | ||||||||||
|
|||||||||||||
|
|||||||||||||
详情 | |||||||||||||
[SAFE-ID: JIWO-2024-3259] 作者: 大猪 发表于: [2023-01-13]
本文共 [534] 位读者顶过
Instead of introductionWe can't imagine Windows without section objects (or file mapping objects in terms of Windows API) and hardly can we find a Windows kernel subsystem that doesn't address it. The great idea behind section objects is that instead of calling Windows File APIs to work with a file, you can read virtual memory to get file data and write virtual memory to write file data. But this simple concept doesn't have simple things under the hood. To simplify the understanding of this difficult topic, we take Windows x86 edition with 32-bit pointers. Don't worry if you can't understand all the things, even skilled Windows Internals readers may have difficulties with this topic. I would recommend to read the corresponding chapter from the Windows Internals book, because this blog post includes a lot of technical stuff and describes some kind of low level things. The basic termsSo if you're ready, let's get started. First, we need to take a quick look at some technical terms, because without understanding any of them, we can't get the full picture. Next we'll focus on each of them in detail.
Diving deeper into the Section kernel objects
Section is a kernel object that is created and maintained by the VMM. The MmCreateSection function creates the kernel object, allocating memory for it from the paged pool, initializes its fields, creates Control Area and Segment structures if needed (see MiCreateImageFileMap, MiCreateDataFileMap). To create an object, the caller of MmCreateSection must provide a pointer to a FileObject that describes the file to be mapped. Using the FileObject, the functions mentioned above initialize Control Area and Segment structures.
MmCreateSection is responsible not only for initializing a Section object, but also for initializing and maintaining important PSECTION_OBJECT_POINTERS FILE_OBJECT->SectionObjectPointer structure. You can see its definition below.
[出自:jiwo.org]
typedef struct _SECTION_OBJECT_POINTERS {PVOID DataSectionObject; PVOID SharedCacheMap; PVOID ImageSectionObject;} SECTION_OBJECT_POINTERS;
As you can see all these three fields point to the structures needed to perform a certain type of file operations. The SECTION_OBJECT_POINTERS structure is created by the FSD when it gets a request to create (open) a file. The Cache Manager deals with .SharedCacheMap. Even if there are no sections for the file object (i e .DataSectionObject and .ImageSectionObject are NULL), .SharedCacheMap is almost always initialized (for disk files), because the Cache Manager caches parts of the file to provide quick access to its data. To create .DataSectionObject and .ImageSectionObject the VMM uses functions MiCreateDataFileMap and MiCreateImageFileMap.
NTSTATUS MmCreateSection(OUT PVOID *SectionObject, IN ACCESS_MASK DesiredAccess, IN POBJECT_ATTRIBUTES ObjectAttributes OPTIONAL, IN PLARGE_INTEGER MaximumSize, IN ULONG SectionPageProtection, IN ULONG AllocationAttributes, IN HANDLE FileHandle OPTIONAL, IN PFILE_OBJECT File OPTIONAL)
Description of these arguments matches those ones from NtCreateSection.
Take a look at the Control Area structure
Segment control area (or just Control Area, CA) is a structure containing the information necessary to perform I/O operations with a section. It's stored in the nonpaged pool and is described by the following structure.
Control Area contains all the necessary data to perform I/O operations with the section.
The Control Area structure contains the flags that indicate what kind of data is addressed by the section. When the VMM creates a CA object for an executable file using MiCreateImageFileMap, its size is equal to the size of the CA structure, plus the size of one Subsection structure multiplied by the number of subsections (i e number of PE sections + 1 for PE header). It's important to note that, in case of image file map, all _SUBSECTION structures are located immediately after the Control Area and their number is stored in the NumberOfSubsections field. The subsections of one section (Control Area) are linked in the list via .NextSubsection. The !ca comment of WinDbg prints information about Control Area.
We can also explore these structures manually for the first three subsections.
Further, we'll discuss this output in more detail.
As it was mentioned earlier, the FILE_OBJECT structure has a very important structure called _SECTION_OBJECT_POINTERS. This structure addresses two CAs, one for a binary mapping type and second if the file is mapped as executable (the same file can be mapped as both binary and executable). These CAs point to different Segments with their own PPTE tables. This structure is maintained by the FSD.
Subsections are allocated in virtual memory strongly after the CA structure. For example, if the Control Area describes executable view, then ControlArea = ExAllocatePoolWithTag (NonPagedPool, sizeof(CONTROL_AREA) + (sizeof(SUBSECTION) * SubsectionsAllocated), 'iCmM').
A few words about Subsections
Subsection (_SUBSECTION) is a data structure containing the necessary information to calculate file offsets for the mapped file using the PPTEs. In case of a binary mapping type, there's only one subsection, but if the file is mapped as executable, then there're as many sections as there are in the executable. Since all the PTEs describing this subsection will have the same page protection bits (copy-on-write, read only, etc), it would be logically to maintain one data structure for all these PTEs. This data structure is called Subsection. All PPTEs point to the same corresponding subsection for both binary and executable mapping types. Moreover, the subsections contain the starting sector of the beginning of the PE's section. It's taken from the PE header as Raw_section_offset/SECTOR_SIZE. Also the subsection stores a pointer to the first PPTE in the segment's PPTE table and number of PTEs for this subsection (i e the number of virtual pages for this PE section, its VirtualSize rounded to a multiple of PAGE_SIZE). Having the address of the structure (executable mapping type), we can easily calculate the offset in the PE file, which this PPTE describes (as a distance between the base and current PTEs). If Pte is a pointer to PPTE, then the formula is.
(((PUCHAR)Pte - (PUCHAR)Subsection->SubsectionBase) / sizeof(PTE)) << PAGE_SHIFT + Subsection->StartingSector * SECTOR_SIZE
or for x86
(((PUCHAR)Pte - (PUCHAR)Subsection->SubsectionBase) / 4) << 12 + Subsection->StartingSector * SECTOR_SIZE
If Subsection is a ptr to the subsection, then the first PTE that describes it is FirstPte = &Subsection->SubsectionBase[0], and it's boundary, LastPte = &Subsection->SubsectionBase[Subsection->PtesInSubsection]. I e if X - the address of a PE file's subsection in virtual memory, then &Subsection->SubsectionBase[0] <= Pte < &Subsection->SubsectionBase[Subsection->PtesInSubsection].
Exploring the Segment structure
Unlike the Control Area structure that is designed to perform I/O operations with a file, the Segment stores information about a PE file that was taken from its PE header. In case of a binary file, this data isn't used. According to its purpose, a Segment also stores the Proto-PTE table (array) that addresses the offsets from the beginning of the mapped file through the Subsection structures. For example, if the VMM needs to load file data from the mapped file into virtual memory, it locates the corresponding Proto-PTE entry in the Segment table via not valid hardware PTE, which caused a page fault, from the page table. Next, using the Control Area structure and the calculated file offset, the VMM reads data from the file into virtual memory.
MmCreateSection creates segments using the following functions. It happens only if the file is mapped for the first time, otherwise the function gets a pointer to it via FileObject. Note that no matter how many sections have been created for the file object, there's always only one segment structure per type of mapping (binary, executable) for all of them. The same applies to Control Area structures, there's only one Control Area per type of mapping regardless of the number of created sections.
NTSTATUS MiCreateImageFileMap (IN PFILE_OBJECT File, OUT PSEGMENT Segment)
NTSTATUS MiCreateDataFileMap (IN PFILE_OBJECT File, OUT PSEGMENT *Segment, IN PUINT64 MaximumSize, IN ULONG SectionPageProtection, IN ULONG AllocationAttributes, IN ULONG IgnoreFileSizing)
As you can see MiCreateImageFileMap accepts fewer arguments, because it reads all the necessary information from the PE header of the executable file to be mapped. Description of other arguments you can find in NtCreateSection.
The following structure describes Segment.
Perhaps the following image gives you a better understanding.
Behind the curtain of Section PTEs
As it was mentioned many times earlier, PPTEs and hardware PTEs pointing to them are key things to understand the virtual addresses translation concept for the mapped sections properly. The difference between them is that the first is stored in the Segment object, while the second in the process's page table (hardware PTE). Both can be in two major states - valid and invalid (P bit in the structure). Zeroed bit means that the mapped page is absent in physical memory and signals the VMM that its content should be read from disk. If the P bit is true, this virtual page is resident in physical memory and no additional actions are required from the VMM. The invalid PTE has a flag signaling that this PTE points to PPTE, i e belongs to the memory mapped file. Once a thread tries to access an invalid memory page, a page fault exception occurs and the VMM exception handler analyzes the PTE to learn what kind of pages it describes. There are several types of invalid PTEs, but we won't discuss this topic here. Also note that in case of a resident virtual page the VMM stores a pointer to PPTE and its value in the PFN database. Let's take a look at the format of these structures. You can the format of the PTE pointing to PPTE in the following pic.
Once you get the ProtoIndex, you can calculate the PPTE address with this formula: PrototypePteAddress = MmPagedPoolStart + PrototypeIndex << 2.
Below you can see PPTE format.
SubsectionAddress = MmSubsectionBase + PrototypeIndex << 3. MmSubsectionBase is usually equal to MmNonPagedPoolStart, because the WhichPool bit is usually set to 1.
Note that the PTE format in the prototype array can vary and is not always a _MMPTE_SUBSECTION.
Now, using our knowledge, we can put all the pieces together and make a complete picture of the actions for getting file data when a thread tries to access a virtual page belonging to a mapped file.
A little practice
Let's get to the Proto-PTE table. Take a random process, dump its basic information and go to the table.
We can go a bit deeper and calculate the offsets manually. To explore these structures it's better to take information from the cache slots as in the case of usual user-mode processes, the kernel can delay the creation of the Proto-PTE table until a thread addresses the mapped file data. I got a list of the cache slots on my system and select one describing the registry hive file NTUSER.DAT. Since it's a data file, there's only one subsection for its Control Area.
Now we can calculate the file offset starting from which the file is mapped to the cache slot using this formula.
FileOffset_LSN = (((PUCHAR)Pte - (PUCHAR)Subsection->SubsectionBase) / 4) << 12 + Subsection->StartingSector * SECTOR_SIZE
(E15B7208 - E15B7008) / 4 *1000 + 0 = 80000, this value you can see in the VACB structure above (Offset: 0x00080000).
Here's another example.
Now look at a more interesting case with PE files, Control Area of which has more than one subsection (one Subsection per one PE subsection). We can simplify our task and skip the first steps, starting with Control Areas. !memusage command can help us.
We can see the addresses of the Control Area structures in the first column. Print it for ole32.dll.
For clarity, copy the results to the table. If we open the PE file in the Cerbero PE Insider tool, we'll see that it has five sections. Note that the first subsection is allocated for the PE header.
Let's check out the formula mentioned above in practice. Take the third subsection, which describes the ole32.dll section starting at offset 0x8FA (in sectors).
Dispatching #PF exceptions for mapped files
As we know the I/O Manager and VMM minimize the performance overhead by performing most of their operations asynchronously and by demand. Probably Windows developers don't know about this principle, because synchronous operations are default behavior for Windows API while the situation with Native and kernel API is reversed. This principle also applies to section objects. When MapViewOfFile Windows API returns control to the caller thread, it doesn't mean that mentioned subsystems copy the file data to virtual memory for RW just as if a thread modified the mapped file data in virtual memory, it doesn't mean that these changes will be immediately flushed to the physical file. Instead, the VMM delays the actual I/O operation until a thread of the process tries to access the file data by reading virtual memory. Once it happened, the #PF exception occurs and the VMM initiates an I/O operation to read file data into virtual memory. The common work in this case falls on the shoulders of MiDispatchFault function.
There're several possible situations for the sections describing file-backed data. Note that the section PPTE can be in the states inherent in a hardware PTE.
MiDispatchFault calls MiResolveProtoPteFault passing it a pointer to PTE and PPTE. MiResolveProtoPteFault works with PPTE as well as with usual PTE, because PPTE can be in the same states as hardware PTE. The function starts by validating the PPTE, i e whether it's located in physical memory or not.
After checking the rights access to the page, the function checks the case when the PPTE is marked as Demand Zero and its hardware PTE marked as copy-on-write. In this case, the VMM resolves the fault by calling MiResolveDemandZeroFault and passing it a pointer to real PTE. Further, MiResolveProtoPteFault makes the PPTE valid, it can be in the following states: Demand Zero, Transition, Page File, Pagefile-backed, File-backed.
MiResolveProtoPteFault and MiResolveMappedFileFault functions perform important steps: reserve a page frame (physical page), initializes the corresponding entry in the PFN database, prepare a MDL structure and a special ReadBlock structure for further disk read operation. You can see the entire process in the following diagram.
Inside the Page Writers
In the last part of this blog post we're gonna discuss the Mapped Page Writer subsystem (thread), which is a part of the Modified Page Writer subsystem (or just thread). At this point we already know how the VMM and I/O manager read mapped file data to process's virtual memory in order to provide access to it. But what about writing file data? As was mentioned above, the VMM minimizes the performance overhead by performing its operations that involve disk I/O by demand. That's why the actual read operation on a memory mapped file only happens when a thread tries to access a file view and not when executing MapViewOfFile.
The VMM has two system threads called MiModifiedPageWriter and MiMappedPageWriter. In fact, MiModifiedPageWriter just creates MiMappedPageWriter thread and shifts the rest of work to MiModifiedPageWriterWorker. These last two functions (MiModifiedPageWriterWorker and MiMappedPageWriter) are two infinite loops that can be called Modified Page Writer, because they implement all its functionality. The first one is responsible for gathering information about modified pages belonging to the page file (MiGatherPagefilePages) and about modified pages belonging to the mapped files (MiGatherMappedPages). It also adjusts the frequency of the flushing operations or how often modified pages will be written to disk. The second thread takes the information prepared by MiModifiedPageWriterWorker and performs the actual disk write operation (for mapped files).
The main part of MiModifiedPageWriterWorker is an infinite loop with waiting on the MiMappedPagesTooOldEvent event. This event can be set in several circumstances and adjusts the frequency of performing flushing. To provide a fixed time frequency of flushing, the VMM uses a timer object and a DPC object (MiModifiedPageWriterTimerDpc), i e the DPC handler MiModifiedPageWriterTimerDispatch calls every time a timer expires. Since this handler is executed with high IRQL DPC_DISPATCH (2, the scheduler level), the system ensures its operation in privileged mode. The MmInitSystem function initializes this object during the system startup and when MiModifiedPageWriterWorker need to gather dirty memory pages for the first time, it sets this timer. The timer is set for 3 seconds.
Forgot to mention that physical memory pages (frames) that were modified since the section was mapped are called Dirty. The CPU sets this bit at the first write operation to the page. Once the VMM processes this page in a certain way, it resets this bit. Without it, the VMM wouldn't be able to track the changes made by the thread on the mapped page and synchronize them with the file data on disk.
For the convenience of flushing file data and, in order to reduce overhead, MiMappedPageWriter doesn't flush every dirty page separately, instead, MiGatherMappedPages gathers in the packet (MMMOD_WRITER_MDL_ENTRY) a set of dirty frames that belong to the same section and are adjacent to this dirty frame. The MMMOD_WRITER_MDL_ENTRY structure describes a set of dirty PFNs that should be written to disk. In fact, the VMM uses two MDL lists, one for a paging file and another for sections. The pool of these MDL items is allocated in the NtCreatePagingFile function that is responsible for creating page files. The same for memory mapped files - MmMappedFileHeader.
As it was mentioned above, MiMappedPageWriter is responsible for initiating disk write operation, here's its pseudocode, in which the details are omitted.
Please take a look at the following diagram to understand the entire process.
However, a thread can force the VMM to write modified data to disk immediately. Windows API provides applications with the FlushViewOfFile function. The VMM's internal function MiFlushSectionInternal is responsible for flushing section data and calls IoAsynchronousPageWrite to perform disk write operation.
Instead of conclusion
Thank you for your attention and hope you enjoyed the blog post. Windows Sections is quite a difficult topic, especially, for beginners, because you should already have an idea of other Windows kernel subsystems to understand it properly.
If you have any comments or remarks, please let me know and feel free to contact. I'm going to cover several other topics on the Windows VMM internals such as the PFN database, hyperspace and virtual address translation.
|