==Phrack Inc.== Volume 0x0c, Issue 0x41, Phile #0x08 of 0x0f |=---------------------=[ Mistifying the debugger, ]=--------------------=| |=---------------------=[ ultimate stealthness ]=--------------------=| |=-----------------------------------------------------------------------=| |=------------------------=[ halfdead@phear.org ]=-----------------------=| --[ Introduction Over the years, there have been a plethora of techniques and methods of hiding one's presence in a hacked system. Many of them were focused on directly tampering the system call table, others were modifying the interrupt handler, while others were operating at the VFS layer. But all of them were modifying the underlying operating system in a very visible manner, making them easily detected. In the article I will present a technique that is able to achieve ultimate stealthness in kernel rootkits, by using a common x86 feature, the debugging mechanism. Although it works on any IA-32 compatible platform, the following technique will be detailed for Linux operating system and I will show you how one can intercept the normal flow of execution without touching the "classical" hooking targets. In fact, this technique can be so good that no one will ever notice our presence. When we refer to "debugger" in this article, we actually mean the IA-32 debugging mechanism, which is only accessible from ring zero. Userland debuggers don't make use of this mechanism, only some kernel debuggers do. --[ The debugger "The IA-32 architecture provides extensive debugging facilities for use in debugging code and monitoring code execution and processor performance. These facilities are valuable for debugging applications software, system software, and multitasking operating systems." In order to make life easier for developers, Intel introduced a mechanism that was intented to manage the debugging process. This mechanism is handled by a set of special registers (called 'debugging registers, DR0..DR7) which allow the user to set hardware breakpoints on memory addresses. As soon as the execution flow hits an address marked with a breakpoint, it hands the control to the debug interrupt handler (INT 1), which calls the do_debug() function (defined in ../i386/kernel/traps.c) to take care of the actual situation that raised the exception. The debugging support is accessed through the debug registers (DB0 through DB7) and two model-specific registers (MSRs). For the purpose of this paper we will only focus on the debug registers. These registers hold the addresses of memory and I/O locations, called breakpoints. Breakpoints are user-selected locations in a program, a data-storage area in memory, or specific I/O ports where a programmer or system designer wishes to halt execution of a program and examine the state of the processor by invoking debugger software. A debug exception (#DB) is generated when a memory or I/O access is made to one of these breakpoint addresses. A breakpoint is specified for a particular form of memory or I/O access, such as a memory read and/or write operation or an I/O read and/or write operation. The debug registers support both instruction breakpoints and data breakpoint. The MSRs (which were introduced into the IA-32 architecture in the P6 family processors) monitor branches, interrupts, and exceptions and record the addresses of the last branch, interrupt or exception taken and the last branch taken before an interrupt or exception. --[ The debug registers There are 8 debug registers supported by the Intel processors, which control the debug operation of the processor. These registers can be written to and read using the move to or from debug register form of the MOV instruction. A debug register may be the source or destination operand for one of these instructions. The debug registers are privileged resources; a MOV instruction that accesses these registers can only be executed in real-address mode, in SMM, or in protected mode at a CPL of 0. An attempt to read or write the debug registers from any other privilege level generates a general protection exception. The primary function of the debug registers is to set up and monitor from 1 to 4 breakpoints, numbered 0 though 3. The debug mechanism allows us to manage the breakpoints through two special registers, DR6 and DR7, which I will describe in detail later on. For each breakpoint, the following information can be specified and/or detected with the debug registers: - The linear address where the breakpoint is to occur. - The length of the breakpoint location (1, 2, or 4 bytes). - The operation that must be performed at the address for a debug exception to be generated. - Whether the breakpoint is enabled. - Whether the breakpoint condition was present when the debug exception was generated. -------[ Debug address registers Each of the debug-address registers (DR0-DR3) holds the 32-bit linear address of a breakpoint. Breakpoint comparisons are made before physical address translation occurs. -------[ Debug registers DR4 and DR5 Debug registers DR4 and DR5 are reserved when debug extensions are enabled (the DE flag in control register CR4 is set), and attempts to reference these registers will raise an invalid-opcode exception. When the DE flag is not set, these registers are aliased to DR6 and DR7. ------[ Debug status register (DR6) This special register is used to report the debug conditions that existed at the time the last debug exception occured. The flags in this register show the following information: - B0..B3 (bits 0..3) indicate that a breakpoint condition was detected. These flags are set if the condition described for each breakpoint by the LENn, and R/Wn flags in debug control register DR7 is true. They are set even if the breakpoint is not enabled by the Ln and Gn flags in register DR7. - BD (bit 13) (debug register access detected) indicates that the next instruction in the instruction stream will access one of the debug registers (DR0..DR7). This flag is enabled when the general detect (GD) flag in debug control register DR7 is set. - BS (bit 14) (single step) indicates (when set) that the debug exception was triggered by the single-step execution mode. - BT (bit 15) (task switch) indicates (when set) that the debug exception resulted from a task switch where the debug trap flag in the TSS of the target task was set. The processor never clears the contents of DR6 register. ------[ Debug control register (DR7) The debug control register (DR7) enables or disables breakpoints and sets breakpoint conditions. Its flags and fields control the following things: - L0..L3 (bits 0, 2, 4, 6) (local breakpoint enable) enable (when set) the breakpoint condition for the associated breakpoint for the current task. When a breakpoint condition is detected and its associated Ln flag is set, a debug exception is generated. The processor automatically clears these flags on every task switch to avoid unwanted breakpoint conditions in the new task. - G0..G3 (bits 1, 3, 5, 7) (global breakpoint enable) enable (when set) the breakpoint condition for the associated breakpoint for all tasks. When a breakpoint condition is detected and its associated Gn flag is set, a debug exception is generated. The processor does not clear these flags on a task switch, allowing a breakpoint to be enabled for all tasks. - LE and GE (bits 8 and 9) (local and global exact breakpoint enable) cause the processor to detect the exact instruction that caused a data breakpoint condition. Not supported in P6 family processors. - GD (bit 13) (general detect enable) enables (when set) debug-register protection, which causes a debug exception to be generated prior to any MOV instruction that accesses a debug register. When such a condition is detected, the BD flag in debug status register DR6 is set prior to generating the exception. - R/W0..R/W3 (bits 16, 17, 20, 21, 24, 25, 28, and 29) (read/write) specifies the breakpoint condition for the corresponding breakpoint. For more information read the Intel manual. - LEN0..LEN3 (bits 18, 19, 22, 23, 26, 27, 30, and 31) (length) --[ The magic Ok, so we've learnt almost everything now about the IA-32 debugging mechanism. Where is the goodies you've promised?? Now we know a few important things: we can set a breakpoint on a memory address and as soon as execution flow hits our breakpoint, the execution is redirected to the debug handler (INT 1). Uhmm, so what if we replace the existing debug handler or one of the underlying functions with our own? As we can see from entry.S, ENTRY(debug) pushl $0 pushl $ SYMBOL_NAME(do_debug) jmp error_code the actual debug handler is a C function, do_debug() defined in traps.c. Yes, ok, I think we are able to patch the INT 1 handler and then call do_debug() on our own OR we could come up with our own do_debug() and expect to be called by the debug handler, so we rest assured that the IDT remains untouched. But what should our handler handle? Most obviously, we need to check a few parameters and then pass control to the actual operating system do_debug(). But what parameters should we monitor? Keep reading... ------[ Hijacking the sys_call_table[] Now you should have an idea how to hijack the syscall table making use onunnt on read/write/execution on targetted address in memory. This can be either INT 80 handler address or syscall table address, it matters less as the effect is the same, in the end. Therefore, each time the operating system is going for a syscall, it will wind up in our handler. We have two options here: A) hijacking the INT 80 handler directly in IDT or B) hijacking the actual address of sys_call_table[] in memory. Any of them is fit for our purposes, so we will aim for A. The following function will return the address of INT 80 handler. get_idt_entry: sidt idtr movl idtr+2, %ebx leal (%ebx, %eax, 8), %ebx movw (%ebx), %cx roll $16, %ecx movw 0x6(%ebx), %cx roll $16, %ecx movl %ecx, %eax ret Once we know the address, we can set up a breakpoint as follows: set_bpm: movl $0x80, %eax call get_idt_entry movl %eax, %dr0 xorl %eax, %eax orl $0x2080, %eax movl %eax, %dr7 ret As you can see, the set_bpm() function will load DR0 with memory address where INT 80 is located and, also, will set up the according flags in DR7, including the magic GD bit, which allows us to monitor WHO and WHY is accessing the debug registers. This bit is very important for us because it "causes a debug exception to be generated prior to any MOV instruction that accesses a debug register". Wow, do you mean...? Yeah, if SOMEONE is trying to read/write the debug registers, the control is passed to our handler BEFORE the instruction takes place. So, we know if someone, a debugger or some tool of the devil, is checking the debug registers, even before they know it. This gives us time to cover our tracks: we can undo everything and wait some time for danger to pass, we can simply skip the instructions affecting the debug registers, etc. The best thing to do is to show the system clean debug registers and after a short period of time, hook everything back to best suit our needs. The best aproach is to come up with a code emulator, analyzing the type of the instruction accessing debug registers, and based on that decide what action will follow: clean the debug registers and restore later or simply increase the instruction count so that the instruction is simply ignored. Anyway, this leaves an open discussion. ------[ The handler Now, we managed to redirect the flow of execution without patching anything in the syscall table or INT 80 handler. But still, what should our handler handle? For starter, in its most simplistic form, our handler needs to check the value of the %eax register, because at this point, it contains the desired syscall number, and based on that it should feed the OS with our hacked syscall. This is how a very simple handler should look like: asmlinkage void new_do_debug(struct pt_regs * regs, long error_code) { unsigned long condition; unsigned long mask = 0x2008; __asm__ __volatile__("movl %%db6,%0" : "=r" (condition)); if (condition & BD_FLAG) { /* someone is r/w the registers */ condition &= ~BD_FLAG; __asm__ __volatile__ ("movl %0, %%db6" : : "r" (condition)); regs->eip += 3; __asm__ __volatile__ ("movl %0, %%db7" : : "r" (mask)); } if (condition & DR_TRAP0) { if (regs->eax == __NR_time) sys_call_table[__NR_time] = hacked_time; if (regs->eflags & VM_MASK) { (*old_do_debug)(regs,error_code); __asm__ __volatile__ ("movl %0, %%db7" : : "r" (mask)); } condition &= ~DR_TRAP0; __asm__ __volatile__ ("movl %0, %%db6" : : "r" (condition)); __asm__ __volatile__ ("movl %0, %%db7" : : "r" (mask)); regs->eflags |= X86_EFLAGS_RF; } else { (*old_do_debug)(regs, error_code); __asm__ __volatile__ ("movl %0, %%db7" : : "r" (mask)); } return; } What are we doing here? First, we grab the values in the status register (DR6) and try to figure out what triggered our handler. If our execution comes as a result of the breakpoint we've placed, we compare the value in %eax register to the value of the syscall we decided to hijack, which was sys_time() in our case. In the example provided, due to the lack of space and time, we did a direct change of the sys_call_table[] but this is not something to worry about as, the hacked_time() is modifying the sys_call_table[] back to original in the instant it gets executed: asmlinkage long hacked_time(int *tloc) { sys_call_table[__NR_time] = original_time; printk("<1>WE changed it!!\n"); return original_time(tloc); } Ofcourse, there are other ways of doing it without touching the syscall table at all but take into consideration that the first thing the hacked_time() does is changing back the value in sys_call_table[], meaning that the actual change takes place for less than a microsecond so it shouldn't be a problem. A better method would be to analyze the parameters of the syscall, based on the syscall number, which at the time our handler takes place is the value in %eax register. We could feed the hacked parameters by simply filling the according registers. This method would create a "virtual" syscall table, so we don't need to touch the actual syscall table at all. So now we learnt how to set a breakpoint on a memory address, how to enable that breakpoint; we also learnt that we can hijack the normal execution flow without tampering the INT 80 handler nor the syscall table handler nor the syscall table itself. Yes, you can say it's a lovely technique, a bit of magic. But still, we modify the INT 1 handler, or at least, we patch the do_debug() function, so we're not that stealth. Just keep reading... ---[ Blindfold We learnt so many beautiful things by now, we take control of the system and no one detects a direct tampering of the kernel. We covered our tracks thanks to the GD/BD bits so, if someone is looking at the debugging registers we simply ignore their curiosity (regs->eip +=3). But what if someone wants to check all the IDT for integrity? Or what if a debugger or a similar tool needs to place its own handler on INT 1? Are we lost then? It sure looks like it.. But wait.. DR6 and DR7 come to rescue once more. What we need to do is the following: - set up your handler on INT 1 - set up the breakpoint to watch for INT 80 address - set a secondary breakpoint to watch on our handler's address Oh, wait! It can't be that simple. Yes, it is! Like this, we practically don't affect the kernel at all, for the unwanted eye. In our ideal handler, the code emulator checks the type of the instruction that attempts to access debug registers, wether is the breakpoint we put on INT 80 or INT 1 and act accordingly. We already explained what it should do for hijacking INT 80, let's talk now about INT 1. By placing a secondary breakpoint on INT 1 or do_debug() function, we make sure that we know apriori when someone attempts to read the only location in the kernel memory we modified. The best thing to do is to make that single address back to original. Like this, when some devilish tool attempts to check for our presence in the IDT too (i don't think there any tools doing that outhere, but that's simply because a whitehat would've never thought it's necessary), we let them see the untouched value. This is "deep cover" mode. But did we lose the control over the kernel now? Well, not really, we're still in control: we can "reinstall" our rootkit after a few nanoseconds, so they miss us every time they look at us. It's like blindfolding them. This technique is also helpful when dealing with a debugger (or similar tool) trying to place its own hook in INT 1 handler. Think about it: we detect the attempt and make everything back to normal, they place their hook, we hijack their hook as a normal INT 1 hijack and as soon as they check for their presence, for example, by checking the presence of the handler, we let them see themselves. It's like chaining hooks, or so. When I discovered that I was stunned. When I realised it really works I was amazed. This is the ultimate stealthness, the holygrail of hackers! ---[ Closing words This technique has been actively used in the underground for more than 8 years now. The beauty about it: it is, in fact, a basic IA-32 feature. They cannot defeat against it without removing the whole debug mechanism. I decided to make it public in phrack through a "scientific" paper *g* but it wasn't my choice: the technique leaked a while ago. I highly doubt that the person that leaked it knows exactly what his tool is actually capable of and what is actually doing, so I decided to help him and any other hacker in the world willing to learn and improve their skills. As you have seen, this is one very powerful technique, allowing one to achieve full stealthness on a target system. Being a fundamental processor feature, means it can be used on ANY operating system running on IA-32 and also, there is no way of detecting or protecting against it, even if it is not 0day anymore ;( ---[ Kudos halvar, twiz, reverser, sd and the rest of the digitalnerds