┏━━┓
BACK
┗━━┛
╔══════════════════════════════════════════════════════════════════════════════════╗
║ 10-04-2026 ║
║ Committing Crimes Against Readable Assembly Part 4 ║
║ ║
║ Magic syscalls ║
║ ║
╠══-----==[ Contents ]==---------------------------------------------------------══╣
║ ║
║ 1: systems, calls, and systemcalls ║
║ 2: ideas ║
║ a: direct bytecode ║
║ b: intentional faults ║
║ c: mmio/pmio ║
║ d: intentional faults revisited ║
║ 3: one small problem ║
║ 4: non syscall syscalls ║
║ 5: final touches and summary ║
║ 6: the program ║
║ ║
╚══════════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════════╗
╠══-----==[ 1 ]==----------------------------------------------------------------══╣
║ ║
║ We're back, it's been quite a while and it is time (for me) to suffer ║
║ ║
║ Last time we got a working binary that compared a value, and "output" a specific ║
║ value depending on the result. While it is true that there is still a lot of ║
║ tweaking to do to make that model work for the types of values we would want to ║
║ be comparing, for the moment let's leave that be - and work on the next ║
║ conceptual piece (I have arbitrarily decided) we need: syscalls ║
║ ║
║ At the moment, our binary is only a program by sheer definition. If we ran it ║
║ outside a debugger, or without a breakpoint, execution would just continue ║
║ careening into the rest of the elf empty page. In order to exit properly, or do ║
║ anything of note - we would need to invoke a syscall. But hang on a minute I ║
║ hear you exclaim, what's a syscall? ║
║ ║
║ From the 'ol Wikipedia: ║
╠══------------------------------------------------------------------------------══╣
║ "a system call (syscall) is the programmatic way in which a computer program ║
║ requests a service from the operating system" ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ As we have decided to try and produce code that runs within an operating ║
║ system (in our case, Linux) we have to play by its rules. That means asking ║
║ very politely when we require some functionality that is outside of our ║
║ program's control ║
║ ║
║ Such functionalities that we may want to be able to use include; exiting the ║
║ program, talking to the file system, displaying to the screen and many more ║
║ ║
║ On older OSs and embedded devices, sometimes these sorts of things were ║
║ accessible through MMIO (mapping device memory to virtual memory within the ║
║ program's scope) or PMMIO (mapping device I/O to ports) ║
║ ║
║ Modern Linux though, instead of being able to access the memory of hardware ║
║ input (of, say, a keyboard) by doing: ║
║ ║
║ mov rax, [keyboard input address] ║
║ ║
║ Forces us to use these "syscalls" (here with the syscall instruction): ║
╠══------------------------------------------------------------------------------══╣
║ mov rax, arg1 # args for "keyboard" syscall ║
║ mov rdi, arg2 ║
║ mov rsi, arg3 ║
║ syscall ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ Unfortunately I count syscall as an instruction, and since it's not spelt "xor" ║
║ we aren't allowed to use it. So how do we go about achieving the functionality ║
║ of system calls without ever using them? ║
║ ║
╠══════════════════════════════════════════════════════════════════════════════════╣
╠══-----==[2:a]==----------------------------------------------------------------══╣
║ ║
║ Let's say that we want to write the following: ║
╠══------------------------------------------------------------------------------══╣
║ mov rax, 0x1 ║
║ mov rdi, 0x2 ║
║ mov rsi, 0x3 ║
║ syscall ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ This assembly has some corresponding bytecode. When compiled with GCC (objdump ║
║ -s -j .text syscall || hd), the snippet above becomes: ║
╠══------------------------------------------------------------------------------══╣
║ 401000 48 c7 c0 01 00 00 00 48 c7 c3 02 00 00 00 48 c7 H......H......H. ║
║ 401010 c1 03 00 00 00 0f 05 ....... ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ We can sort of see the instructions, and where they map onto our snippet: ║
╠══------------------------------------------------------------------------------══╣
║ mov rax, 0x1 -> 48 c7 c0 01 00 00 00 ║
║ mov rdi, 0x2 -> 48 c7 c3 02 00 00 00 ║
║ mov rsi, 0x3 -> 48 c7 c1 03 00 00 00 ║
║ syscall -> 0f 05 ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ Is there then, a way to write these bytes into memory somewhere, and then ║
║ execute them? that way we could do the writing with our xor instruction set, and ║
║ still invoke a syscall ║
║ ║
║ This would require a section of addresses somewhere that was readable, writeable ║
║ and executable by our program. You can do this by explicitly by forcing your ║
║ compiler to link a segment with r...w...e permissions. This is generally ║
║ considered a terrible idea for safety reasons, so it could be perfect for us! ║
║ ║
║ However, given that the bytecodes will be read as opcodes + arguments, we are ║
║ essentially getting the CPU to execute non-xor instructions - even if no non-xor ║
║ instructions appear in our source asm code. The purist in me feels this ║
║ capitulates a core component of the challenge - at that point we are ║
║ essentially just using xor to build a long list of numbers (bytecodes) which ║
║ isn't that impressive ║
║ ║
║ Let's shelve this idea for the moment then ║
║ ║
╠══════════════════════════════════════════════════════════════════════════════════╣
╠══-----==[2:b]==----------------------------------------------------------------══╣
║ ║
║ Our previous program exited whenever it tried to access an address outside of ║
║ the binaries readable segments. In a way, SegFault-ing was our way to halt the ║
║ program's execution ║
║ ║
║ As exceptions halt execution - we can substitute the syscall for exit with a ║
║ purposeful exception invocation. This is a list of all the 64 bit mode ║
║ exceptions for the xor instruction: ║
║ ║
╠══------------------------------------------------------------------------------══╣
║ #SS(0) If a memory address referencing the SS segment is in a ║
║ non-canonical form ║
║ ║
║ #GP(0) If the memory address is in a non-canonical form ║
║ ║
║ #PF(fault-code) If a page fault occurs ║
║ ║
║ #AC(0) If alignment checking is enabled and an unaligned memory ║
║ reference is made while the current privilege level is 3 ║
║ ║
║ #UD If the LOCK prefix is used but the destination is not a memory ║
║ operand ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ I've decided to use the lock prefix without a memory operand for purposeful ║
║ fault invocation. Given the way our xor program flow works (see previous blogs) ║
║ we are likely to see SegFaults when things go wrong, so let's not confuse ║
║ ourselves by adding SegFaults when things go right. Instead this will display ║
║ the IllegalInstruction error in console. ║
║ ║
║ so this is our pseudo-"exit" syscall: ║
║ ║
║ lock xor rax, rbx ║
║ ║
║ Unfortunately for us, GCC knows this would result in a fault, and doesn't allow ║
║ you to compile this: ║
║ ║
║ Error: expecting lockable instruction after `lock' ║
║ ║
║ Instead, we need to directly compile from bytecode (eugh). We can include this ║
║ at the end of our files: ║
║ ║
║ .byte 0xF0, 0x48, 0x31, 0xD8 # bytecode for [lock xor rax, rbx] ║
║ ║
║ Ideally, this should show up as an xor instruction when disassembled. However, ║
║ when opening the executable in Binary Ninja (henceforth Binja), we can see the ║
║ following for the relevant .text section: ║
╠══------------------------------------------------------------------------------══╣
║ 00401000 f0 .. .. .. ?? ║
║ 00401001 .. 48 31 d8 H1. ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ Interestingly it's decided to partition off the last three bytes, perhaps they ║
║ are all utf-8 values? Regardless, it's clearly got this "wrong" - despite having ║
║ the correct bytecode to reconstruct the instructions, Binja has decided that ║
║ this section is non-code ║
║ ║
║ It turns out that _start is not assumed to be a legit entry point to disassemble ║
║ from, as sneaky malware authors could abuse that assumption to mess with the ║
║ real code flow ║
║ ║
║ Hiding opcodes from disassemblers is always something interesting to play with, ║
║ so we might return to this another time, but for the time being - lets see if we ║
║ can get Binja to display the instructions ║
║ ║
║ By explicitly labelling _start as a: ║
║ ║
║ .type _start, @function ║
║ ║
║ We shall make sure Binja knows it's a function - checking the disassembly view, ║
║ we can see that it isn't fixed at all. Hmmm ║
║ ║
║ It seems that Binja knows x86 better than I, and notices that locking an xor ║
║ with two register operands is illegal, and so doesn't assume it is actual code. ║
║ Interestingly, this heuristic allows us to put any regular opcodes and arguments ║
║ afterwards, and Binja will also not mark these up as instructions - since the ║
║ CPU would have faulted previously ║
║ ║
║ As a tentative solution, I shall use the .byte chunk as a substitution for the ║
║ exit syscall, despite the disassembly issues ║
║ ║
║ As for trying to emulate other syscalls using fault behaviour, we aren't so ║
║ lucky. Basically all of them crash the program, just with slightly different ║
║ fault messages. However, the signal handler may well come in handy later on... ║
║ ║
╠══════════════════════════════════════════════════════════════════════════════════╣
╠══-----==[2:c]==----------------------------------------------------------------══╣
║ ║
║ Perhaps, if we could setup an environment where the keyboard and screen control ║
║ that is usually handled by syscalls, was instead mapped to specific regions of ║
║ virtual memory - we could just use relative addressing to use both of those ║
║ features, just as some older systems used to. ║
║ ║
║ This runs counter to the whole compartmentalisation and security ethos of Linux ║
║ so this might be a headache. As such, the only ways to achieve this require, at ║
║ least, root access to a system ║
║ ║
║ One solution might be to create some sort of an "emulator" or VM. Kernel ║
║ programming - oh no ║
║ ║
║ While this seems viable, I'd rather exhaust other options before spending weeks ║
║ being aggrieved at my inability to code kernel modules ║
║ ║
╠══════════════════════════════════════════════════════════════════════════════════╣
╠══-----==[2:d]==----------------------------------------------------------------══╣
║ ║
║ It turns out, to my great fortune, that you can create custom signal handlers ║
║ for your programs in linux! They don't even need escalated privileges ║
║ ║
║ When our program raises a SigIll, the following happens: ║
╠══------------------------------------------------------------------------------══╣
║ the program raises a undefined instruction exception ║
║ | ║
║ ├─> CPU switches to kernel mode ║
║ | ║
║ ├─> kernel detects the exception is a SigIll ║
║ | ║
║ ├─> the program's signal handlers table is accessed, and if one is registered, ║
║ | it is pointed to ║
║ | ║
║ ├─> before calling the handler, the kernel saves the CPU registers in a ║
║ | structure called ucontext ║
║ | ║
║ ├─> the kernel sets up: ║
║ | rdi -> signal id ║
║ | rsi -> pointer to siginfo ║
║ | rdx -> pointer to ucontext ║
║ | ║
║ ├─> the handler is called, and the return value is stored in rax ║
║ | ║
║ └─> program resumes execution with the handler modified register values ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ I have no idea what the ucontext structure is actually like, so let's try and ║
║ setup a test program and inspect the registers to see how it's parsed ║
║ ║
║ A basic program that sets various registers to some useful values to confirm in ║
║ the debugger, then faults, will do just fine: ║
╠══------------------------------------------------------------------------------══╣
║ .intel_syntax noprefix ║
║ .global _start ║
║ ║
║ _start: ║
║ ║
║ mov rax, 0x1 # test values ║
║ mov rdi, 0x2 ║
║ mov rsi, 0x3 ║
║ mov rdx, 0x4 ║
║ mov r10, 0x5 ║
║ mov r8, 0x6 ║
║ mov r9, 0x7 ║
║ ║
║ mov r11, [0x0] ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ As well as a basic signal handler: ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ .intel_syntax noprefix ║
║ .global _sig_handler ║
║ ║
║ _sig_handler: ║
║ ║
║ mov r12, 0xdead ║
║ ret ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ Actually getting a separate program to take another as a signal handler seems a ║
║ little tricky, so I'll include the _sig_handler inside the main .elf ║
║ ║
║ The syscall for registering a signal handler for a specific signal is sigaction, ║
║ which takes the following arguments: ║
║ ║
║ sigaction(signum, sigaction, sigaction_old) ║
║ ║
║ where the sigaction arguments are structs containing a pointer to the signal ║
║ handler, flags to modify behaviour and any signals to block while running. The ║
║ second sigaction argument is just a pointer to the old struct, we don't need to ║
║ bother with that one, so that just = NULL ║
║ ║
║ Let's make a sigaction struct: ║
╠══------------------------------------------------------------------------------══╣
║ .section .data ║
║ .align 8 ║
║ _sigaction: ║
║ .quad _sig_handler # pointer to the handler ║
║ .quad 0 # no flags ║
║ .quad 0 # no restorer ║
║ .zero 128 # no blockers ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ And then reference it in a syscall to sigaction: ║
╠══------------------------------------------------------------------------------══╣
║ .intel_syntax noprefix ║
║ .global _start ║
║ ║
║ _start: ║
║ mov rax, 0xd # syscall for sigaction ║
║ mov rdi, 0xb # signum for SigSegV ║
║ lea rsi, [rel _sigaction] # pointer to sigaction struct ║
║ xor rdx, rdx # pointer to old sigaction struct ║
║ syscall ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ Once we smush all of that together, we get the following: ║
╠══------------------------------------------------------------------------------══╣
║ .intel_syntax noprefix ║
║ .global _start ║
║ .global _sig_handler ║
║ ║
║ .section .data ║
║ .align 8 ║
║ _sigaction: ║
║ .quad _sig_handler # pointer to the handler ║
║ .quad 0 ║
║ .quad 0 ║
║ .zero 128 # no blockers ║
║ ║
║ .section .text ║
║ _sig_handler: ║
║ mov r12, 0xdead # test value ║
║ loop: ║
║ jmp loop ║
║ ║
║ _start: ║
║ mov rax, 13 # syscall for sigaction ║
║ mov rdi, 11 # signum for SigSegV ║
║ lea rsi, [rip + _sigaction] # pointer to sigaction struct ║
║ xor rdx, rdx # pointer to old sigaction struct ║
║ mov r10, 128 ║
║ syscall ║
║ ║
║ mov rax, 0x1 # load test values ║
║ mov rdi, 0x2 ║
║ mov rsi, 0x3 ║
║ mov rdx, 0x4 ║
║ mov r10, 0x5 ║
║ mov r8, 0x6 ║
║ mov r9, 0x7 ║
║ ║
║ mov r11, [0] # cause SigSegV ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ Compiling with gcc requires us to first make an object file: ║
║ ║
║ gcc -c -g -o testprogram.o testprogram.s ║
║ ║
║ And then link it afterwards: ║
║ ║
║ gcc -nostdlib -no-pie -o testprogram testprogram.o ║
║ ║
║ gdb time (make sure to handle SigSegV nostop pass!) :) ║
║ ║
║ Running our testprogram yields a SegFault - that's a good sign, alongside the ║
║ fact it compiled and linked as expected. This may the smoothest one of these ║
║ blogposts has gone (so far..). ║
║ ║
║ If we hit a breakpoint on our _sig_handler, the rsp register should contain a ║
║ pointer to the top of the signal frame - where our ucontext info is contained. ║
║ which, it doesn't. I knew it was too good to be true. ║
║ ║
║ If we check the value in rax after we try to register a signal handler, we see ║
║ that the value is -22, which is the einval return. This means our registration ║
║ is where we are failing, which is why the SegFault just terminates the program! ║
║ ║
║ Turns out, I needed to be more explicit with my handler, and specify the fault ║
║ that it targets in the arguments - instead of leaving it blank and hoping that ║
║ applied to all faults. Plus I had to change the test values a little, so that ║
║ some duplicates that were showing up in GDB no longer appeared. There is now ║
║ also a "restorer" function, and corresponding arguments. ║
║ ║
║ This is the working test script, for creating and registering a signal handler: ║
╠══------------------------------------------------------------------------------══╣
║ .intel_syntax noprefix ║
║ .global _start ║
║ .global _sig_handler ║
║ ║
║ .section .data ║
║ .align 8 ║
║ _sigaction: ║
║ .quad _sig_handler # pointer to the handler ║
║ .quad 0x04000000 ║
║ .quad _restorer ║
║ .quad 0 ║
║ ║
║ .section .text ║
║ _sig_handler: ║
║ your sighandler instructions of choice ║
║ ret ║
║ ║
║ _restorer: ║
║ mov rax, 15 # rt_sigreturn ║
║ syscall ║
║ ║
║ _start: ║
║ mov rax, 13 # syscall for sigaction ║
║ mov rdi, 11 # signum for SigSegV ║
║ lea rsi, [rip + _sigaction] # pointer to sigaction struct ║
║ xor rdx, rdx # pointer to old sigaction struct (NULL) ║
║ mov r10, 8 ║
║ syscall ║
║ ║
║ mov rax, 0x111 # testing values ║
║ mov rbx, 0x222 ║
║ mov rcx, 0x333 ║
║ mov rdx, 0x444 ║
║ mov rsi, 0x555 ║
║ mov rdi, 0x666 ║
║ mov r8, 0x777 ║
║ mov r9, 0x888 ║
║ mov r10, 0x999 ║
║ mov r11, 0xaaa ║
║ mov r12, 0xbbb ║
║ mov r13, 0xccc ║
║ mov r14, 0xddd ║
║ mov r15, 0xeee ║
║ ║
║ mov r11, [0] # cause SigSegV ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ So - if we breakpoint at the signal handler, and inspect the memory section ║
║ from the addresses for the first 32 double QWORDs contained on the stack, this ║
║ should be where the ucontext is stored (x/32gx $rsp): ║
║ ║
╠══------------------------------------------------------------------------------══╣
║ 0x7fffffffd8f8: 0x40100f |_restorer| 0x7 ║
║ 0x7fffffffd908: 0x0 0x0 ║
║ 0x7fffffffd918: 0x2 0x0 ║
║ 0x7fffffffd928: 0x777 0x888 ║
║ 0x7fffffffd938: 0x999 0xaaa ║
║ 0x7fffffffd948: 0xbbb 0xccc ║
║ 0x7fffffffd958: 0xddd 0xeee ║
║ 0x7fffffffd968: 0x666 0x555 ║
║ 0x7fffffffd978: 0x0 0x222 ║
║ 0x7fffffffd988: 0x444 0x111 ║
║ 0x7fffffffd998: 0x333 0x7fffffffdea0 ║
║ 0x7fffffffd9a8: 0x40109b |_start+131| 0x10246 ║
║ 0x7fffffffd9b8: 0x2b000000000033 0x4 ║
║ 0x7fffffffd9c8: 0xe 0x0 ║
║ 0x7fffffffd9d8: 0x0 0x7fffffffdac0 ║
║ 0x7fffffffd9e8: 0x0 0x0 ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ Our "test" values appear a bit scattered around the place. By counting the ║
║ offsets from the stack pointer for each one, we can reconstruct what the ║
║ ucontext structure looks like: ║
║ ║
║ Byte offset from rsp ---┐ ║
║ ↓ ║
║ ║
║ 000|001|003|004|005|006|007|008|009|010|011|012|013|014|015|016|017|018|019|020| ║
║ | | | | | | | | | | | | | | | ║
║ | | | | | | | | | | | | | | | ║
║ |r8 |r9 |r10|r11|r12|r13|r14|r15|rdi|rsi| |rbx|rdx|rax|rcx| ║
║ ║
║ ↑ ║
║ Corresponding stored registers --┘ ║
║ ║
║ This allows us to pass information from the faulting program (our xor script) ║
║ directly to the signal handler. For instance, let's say we wanted to invoke any ║
║ given systemcall - we could load the arguments for that into rax, rbx and rcx ║
║ and then cause a fault. The signal handler then could read the areas from the ║
║ ucontext structure corresponding to the saved registers (rsp+19*8 for rax, ║
║ rsp+17*8 for rbx and rcx+20*8 for rcx) and then execute the syscall. Importantly ║
║ it also allows our handler to communicate with the faulting script, as the ║
║ ucontext structure is used to restore registers when passing back execution ║
║ flow. ║
║ ║
║ Let's test this idea, by creating a signal handler that writes "0xdead" to ║
║ rsp+19*8 (rax) and inspecting rax once code flow returns to the faulting ║
║ program. We can do this by adding: ║
║ ║
║ mov dword ptr [rsp + 19*8], 0xdead ║
║ ║
║ To the signal handler code. ║
║ ║
║ Once in GDB, we set a breakpoint at the signal handler, and when it hits we can ║
║ step through the instructions until the flow returns to our faulting program. ║
║ At this point, inspecting the registers shows that rax contains 0xdead! ║
║ ║
╠══════════════════════════════════════════════════════════════════════════════════╣
╠══-----==[ 3 ]==----------------------------------------------------------------══╣
║ ║
║ Now that I (partially) understand how the handler works, lets try make one that ║
║ does our syscalls for us ║
║ ║
║ There is one big problem in our way, which is that the signal handler's job is ║
║ to actually handle the faults - we currently do no such thing. When our program ║
║ hits a fault, and the signal handler is done, we are just dumped back into our ║
║ program at the very same faulting instruction, where we fault all over again ║
║ ║
║ Fair warning, if you tell your GDB to ignore SegFaults, and register a ║
║ _sig_handler that doesn't do anything, it may spam error messages until it ║
║ crashes your system (it did to mine...) ║
║ ║
║ How does the signal handler know where to put us? turns out, it uses ucontext ║
║ again, this time reading the instruction pointer (rip) - and passing that to the ║
║ restore syscall so we get plopped out at the same place in instruction flow ║
║ ║
║ after some searching, I found where rip is stored in ucontext (rsp + 22*8). ║
║ Thankfully, the usual write protections present to prevent us from directly ║
║ writing to rip are not present in the version saved to ucontext! ║
║ ║
║ in our example script, the faulting instruction: ║
║ ║
║ mov r11, [0] ║
║ ║
║ is encoded with the following bytecode: ║
║ ║
║ 0x4c 0x8b 0x1c 0x25 0x00 0x00 0x00 0x00 ║
║ ║
║ so, in theory, if we know that rip points to the start of this instruction, then ║
║ rip + 8 would point to the instruction directly afterwards. Allowing the flow of ║
║ execution to continue past it ║
║ ║
║ lets make a program to test this assumption: ║
╠══------------------------------------------------------------------------------══╣
║ .intel_syntax noprefix ║
║ .global _start ║
║ .global _sig_handler ║
║ ║
║ .section .data ║
║ .align 8 ║
║ _sigaction: ║
║ .quad _sig_handler # pointer to the handler ║
║ .quad 0x04000000 ║
║ .quad _restorer ║
║ .quad 0 ║
║ ║
║ .section .text ║
║ _sig_handler: ║
║ xor rax, rax # clear rax ║
║ add rax, [rsp + 22*8] # rax = stored rip ║
║ add rax, 0x8 # rax = stored rip+8 ║
║ mov [rsp + 22*8], rax # move rip+8 to stored rip ║
║ ret ║
║ ║
║ _restorer: ║
║ mov rax, 15 # rt_sigreturn ║
║ syscall ║
║ ║
║ _start: ║
║ mov rax, 13 # syscall for sigaction ║
║ mov rdi, 11 # signum for SigSegV ║
║ lea rsi, [rip + _sigaction] # pointer to sigaction struct ║
║ xor rdx, rdx # pointer to old sigaction struct ║
║ mov r10, 8 ║
║ syscall ║
║ ║
║ ║
║ mov r11, [0] # cause SigSegV ║
║ mov rax, 0xcaff # new landing instruction ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ once again, we break on _sig_handler and step through until we are put back into ║
║ our _start function, directly at our intended landing instruction. Bingo ║
║ ║
╠══------------------------------------------------------------------------------══╣
║ Breakpoint 2, _sig_handler () at testing_controlflow.s:15 ║
║ 15 xor rax, rax # clear rax ║
║ (gdb) si ║
║ 16 add rax, [rsp + 22*8] # rax = stored rip ║
║ (gdb) si ║
║ 17 add rax, 0x8 # rax = stored rip+8 ║
║ (gdb) si ║
║ 18 mov [rsp + 22*8], rax # move rip+8 to stored rip ║
║ (gdb) si ║
║ 19 ret ║
║ (gdb) si ║
║ _restorer () at testing_controlflow.s:22 ║
║ 22 mov rax, 15 # rt_sigreturn ║
║ (gdb) si ║
║ 23 syscall ║
║ (gdb) si ║
║ _start () at testing_controlflow.s:35 ║
║ 35 mov rax, 0xcaff # new landing instruction ║
╠══------------------------------------------------------------------------------══╣
╠══════════════════════════════════════════════════════════════════════════════════╣
╠══-----==[ 4 ]==----------------------------------------------------------------══╣
║ ║
║ I think we have all the tools required to write a syscall-less script that ║
║ invokes one. As the final aim for this blogpost, I'll try to craft a PoC ║
║ ║
║ Let's start with a basic script idea; print to the screen and then exit. I shall ║
║ leave any conditional statements for later - as they will add a lot of visual ║
║ clutter without much logical complexity (beyond just combining the methods) ║
║ ║
║ For additional simplicity, I'll use an 8 letter word; TRIANGLE. This way it can ║
║ fit entirely within one register. The little-endian ASCII encoding for TRIANGLE ║
║ is; ║
║ ║
║ 0x454C474E41495254 ║
║ ║
║ we will be using the write syscall, and so need to populate the following ║
║ registers with the correct args: ║
╠══------------------------------------------------------------------------------══╣
║ rax = 1 (syscall for write) ║
║ rdi = 1 (we are writing to stdout) ║
║ rsi = char buffer ║
║ rdx = len of chars ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ And for the exit "syscall", we will just use the SigIll instruction from before ║
║ ║
║ Okay, this took way longer than I thought it would. Here I go again ║
║ underestimating the complexity of "small" tasks. Some fun excerpts include: ║
╠══------------------------------------------------------------------------------══╣
║ forgetting that xor cannot take imm 64 values - so only 4 characters max ║
║ ║
║ having to spread TRIANGLE over two registers instead ║
║ ║
║ issues with getting a pointer to our string using xor only ║
║ ║
║ trying to put the pointer logic inside the _sig_handler instead ║
║ ║
║ resorting to using the stack, and stack pointers ║
║ ║
║ forgetting that using the stack will offset the whole ucontext struct ║
║ ║
║ having to offset the ucontext relative addressing by 8 ║
║ ║
║ reordering the _sig_handler so I don't have to offset the addressing ║
║ ║
║ problems with the length arg, so offloading that to the _sig_handler too ║
║ ║
║ found out the issue was my addressing - so moved it back to _start ║
║ ║
║ cleaning up the stack to ensure the ucontext rip is found correctly ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ All that being said, I did manage to get a working program that I am happy with. ║
║ Looking at this stuff really brings to focus how much better more modern code ║
║ conveys dense information, assembly really just starts to resemble grains of ║
║ sand once you stare at it for too long. As such, I've included a "high" level ║
║ abstraction of the program below: ║
║ ║
╠══------------------------------------------------------------------------------══╣
║ .PREFIX_STUFF ║
║ ║
║ _SIGACTION_STRUCT: ║
║ struct stuff ║
║ ║
║ _SIGNAL_HANDLER: ║
║ ║
║ move saved rax -> rax ║
║ move saved rdi -> rdi ║
║ move saved rdx -> rdx ║
║ move saved r12 -> r12 ║
║ ║
║ reserve 8 bytes on the stack ║
║ put r12 onto the stack ║
║ move pointer to r12 -> rsi ║
║ ║
║ syscall ║
║ ║
║ clean up the stack ║
║ ║
║ move saved rip -> rax ║
║ add 8 to rax ║
║ move rax -> saved rip location ║
║ ║
║ ret ║
║ ║
║ _RESTORER_FUNCTION: ║
║ restorer syscall ║
║ ║
║ _START: ║
║ ║
║ setup signal handler ║
║ ║
║ store TRIA in r12 ║
║ move 1 -> rax # 1 = syscall for write ║
║ move 1 -> rdi # 1 = arg for "to console" ║
║ move len of TRIA -> rdx ║
║ ║
║ SegFault ║
║ ║
║ store NGLE in r12 # we reuse args here ║
║ ║
║ SegFault ║
║ ║
║ SigIll # illegal instruction exit ║
╠══------------------------------------------------------------------------------══╣
╠══════════════════════════════════════════════════════════════════════════════════╣
╠══-----==[ 5 ]==----------------------------------------------------------------══╣
║ ║
║ While this is entirely "functional" (for our purposes) there are still some ║
║ small tradeoffs this approach forces us to deal with. Despite our _start ║
║ function now being non-xor free, we do have to include at least three syscalls. ║
║ One in the code that sets up the _sig_handler, one in the _sig_handler itself ║
║ and another in the _restorer. ║
║ ║
║ In writing the asm for this, I defaulted to my usual (and unforgivable) trait of ║
║ using mov instructions. Going through and replacing them was not too tricky, ║
║ only requiring me to remember that xor cannot operate directly on dereferenced ║
║ memory. ║
║ ║
║ The final non-xor instruction count comes to 8 (which I am happy with): ║
╠══------------------------------------------------------------------------------══╣
║ 1x sub ║
║ 1x ret ║
║ 1x lea ║
║ 2x add ║
║ 3x syscall ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ The logical next step would be to combine this functionality with our ║
║ non-branching conditional logic from previous blogs to make a more interesting ║
║ program. I'm undecided on what that should be - a full calculator seems a little ║
║ tricky, but interesting, perhaps a simple text adventure? ║
║ ║
║ Feel free to ping me some ideas, though I reserve the right to refuse! ║
║ ║
╚══════════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════════╗
╠══-----==[ 6 ]==----------------------------------------------------------------══╣
║ ║
║ Of those who have been sufficiently interested to read thus far, there may be a ║
║ further subsection who would be curious to see the full code for this program, ║
║ so I'll put that here: ║
║ ║
║ (Please excuse my comments, I appear to not be able to follow a rubric, and ║
║ each time I program they turn out different, nonetheless hopefully they help ║
║ explain what each part does) ║
╠══------------------------------------------------------------------------------══╣
║ .intel_syntax noprefix ║
║ .global _start ║
║ .global _sig_handler ║
║ ║
║ .section .data ║
║ .align 8 ║
║ _sigaction: ║
║ .quad _sig_handler # pointer to the handler ║
║ .quad 0x04000000 ║
║ .quad _restorer ║
║ .quad 0 ║
║ ║
║ .section .text ║
║ _sig_handler: ║
║ ║
║ xor rax, rax ║
║ xor rax, [rsp + 19*8] # move saved rax to rax ║
║ xor rdi, rdi ║
║ xor rdi, [rsp + 14*8] # move saved rdi to rdi ║
║ xor rdx, rdx ║
║ xor rdx, [rsp + 18*8] # move saved rdx to rdx ║
║ xor r12, r12 ║
║ xor r12, [rsp + 10*8] # move saved r12 to r12 ║
║ ║
║ sub rsp, 8 # reserve some stack space for chars ║
║ ║
║ xor r11, r11 ║
║ xor r11, [rsp] ║
║ xor [rsp], r11 # zeroing out *rsp ║
║ ║
║ xor qword ptr [rsp], r12 # pointer to r12 ║
║ xor rsi, rsi ║
║ xor rsi, rsp ║
║ # take care about ucontext offset ║
║ syscall # execute the syscall ║
║ ║
║ add rsp, 0x8 # clean stack ║
║ ║
║ xor rax, rax # clear rax ║
║ xor rax, [rsp + 22*8] # rax = stored rip ║
║ add rax, 0x8 # rax = stored rip+8 ║
║ ║
║ xor r11, r11 ║
║ xor r11, [rsp + 22*8] ║
║ xor [rsp + 22*8], r11 # zeroing out *[rsp + 22*8] ║
║ ║
║ xor [rsp + 22*8], rax # move rip+8 to stored rip ║
║ ║
║ ret ║
║ ║
║ _restorer: ║
║ xor rax, rax ║
║ xor rax, 15 # rt_sigreturn ║
║ syscall ║
║ ║
║ _start: ║
║ xor rax, rax ║
║ xor rax, 13 # syscall for sigaction ║
║ xor rdi, rdi ║
║ xor rdi, 11 # signum for SigSegV ║
║ lea rsi, [rip + _sigaction] # pointer to sigaction struct ║
║ xor rdx, rdx # pointer to old sigaction struct ║
║ xor r10, r10 ║
║ xor r10, 8 ║
║ syscall ║
║ ║
║ xor r12, r12 ║
║ xor r12, 0x41495254 # r12 = TRIA ║
║ xor rax, rax ║
║ xor rax, 0x1 # rax = syscall for write ║
║ xor rdi, rdi ║
║ xor rdi, 0x1 # rdi = output, stdout ║
║ # pointer to TRIA by sig_handler ║
║ xor rdx, rdx ║
║ xor rdx, 0x4 # len of TRIA (4 bytes) ║
║ ║
║ xor r11, [0] # cause SigSegV ║
║ ║
║ xor r12, r12 ║
║ xor r12, 0x454C474E # r12 = NGLE ║
║ ║
║ xor r11, [0] # cause SigSegV ║
║ ║
║ xor r12, r12 ║
║ xor r12, 0x0a # newline char ║
║ xor rdx, rdx ║
║ xor rdx, 0x1 # length arg = 1 ║
║ ║
║ xor r11, [0] # cause SigSegV ║
║ # reuse other register values ║
║ ║
║ .byte 0xF0, 0x48, 0x31, 0xD8 # illegal instruction exit ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ Until next time, CWW out ║
╚══════════════════════════════════════════════════════════════════════════════════╝
┏━━┓
BACK
┗━━┛