┏━━┓
BACK
┗━━┛
╔══════════════════════════════════════════════════════════════════════════════════╗
║ Contents - 03-01-2026 committing crimes against readable assembly part 2, xor ║
║ edition ║
║ -------------------------------------------------------------------------------- ║
║ setting some ground-rules ║
║ xor info-dump ║
║ clearing registers ║
║ loading and moving values around registers ║
║ conditional flow ║
║ basic arithmetic ║
║ basic logic ║
╚══════════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════════╗
║ ║
║ So, where do we even begin? ║
║ ║
║ If we are to make a complier for regular assembly, that only outputs xor ║
║ instructions (that's the aim here btw) then we better be able to translate a ║
║ decent chunk of the assembly instruction set into xor equivalents ║
║ ║
║ Here is the list of assembly instructions, in order of usage in some popular ║
║ opensource programs (from here): ║
║ ║
║ mov # 35% ║
║ push # 10% ║
║ call # 6% ║
║ cmp # 5% ║
║ add # 4% ║
║ pop # 4% ║
║ lea # 4% ║
║ test # 3% ║
║ je # 3% ║
║ xor # 2% ║
║ jmp # 2% ║
║ jne # 2% ║
║ ret # 1% ║
║ inc # 1% ║
║ sub # 1% ║
║ fld # 1% ║
║ and # 1% ║
║ fstp # 1% ║
║ shl # 1% ║
║ or # 1% ║
║ others # 11% ║
║ ║
║ As a realistic aim, lets rule out some assembly instructions that I think might ║
║ be either pretty tricky, or just require abstraction after we handle the more ║
║ fundamental items: ║
║ ║
║ -> Floating point stuff (fld, fstp) I don't even know how these work tbh... ║
║ -> Anything that alters rip (ret, call, jmp, syscall, 0x80) ║
║ -> Stack addressing (push, pop) maybe later? ║
║ ║
║ This does mean we can't just pipe the output of your favourite complier into ║
║ our xor program. So we'd potentially be limited to my handwritten assembly ║
║ ║
╠══════════════════════════════════════════════════════════════════════════════════╣
║ ║
║ I do have to remind myself of the full range of options available to us from ║
║ the xor instruction - if you are familiar with this please feel free to skip ║
║ this section. ║
║ ║
║ The truth table for xor is as follows: ║
║ ║
║ A xor B = C ║
║ 1 1 -> 0 ║
║ 1 0 -> 1 ║
║ 0 1 -> 1 ║
║ 0 0 -> 0 ║
║ ║
║ xor also has the following useful properties: ║
║ ║
║ x xor a = a xor x (order of operations doesn’t matter) ║
║ a xor a = 0 ║
║ a xor x xor a = x (commutative - xoring by the same value twice cancels out) ║
║ x xor 0 = x ║
║ a xor (b xor c) = (a xor b) xor c (associative) ║
║ ║
║ Here is the blurb from the x86_64 intel instruction manual for xor: ║
║ ║
╠══------------------------------------------------------------------------------══╣
║ Performs a bitwise exclusive OR (xor) operation on the destination (first) and ║
║ source (second) operands and stores the result in the destination operand ║
║ location. The source operand can be an immediate, a register, or a memory ║
║ location; the destination operand can be a register or a memory location. ║
║ (However, two memory operands cannot be used in one instruction.) Each bit of ║
║ the result is 1 if the corresponding bits of the operands are different; each ║
║ bit is 0 if the corresponding bits are the same. This instruction can be used ║
║ with a LOCK prefix to allow the instruction to be executed atomically. ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ So that allows us to play with not only direct values, but registers and ║
║ intermediate values (-128 -> 127 as well) ║
║ ║
╠══════════════════════════════════════════════════════════════════════════════════╣
║ ║
║ If we can work out the following sorts of instructions we ought to be well on ║
║ our way: ║
║ ║
║ -> 1 clearing a register ║
║ -> 2 loading and moving values around registers ║
║ -> 3 conditional flow ║
║ -> 4 basic arithmetic ║
║ -> 5 basic logic ║
║ ║
╠══-----==[ 1 ]==----------------------------------------------------------------══╣
║ ║
║ Our first objective is pretty easy, if a number x is xored with itself, it ║
║ always yields 0 ║
║ ║
║ So a "clear" instruction like: ║
║ ║
║ mov rax, 0x00 ║
║ ║
║ Becomes: ║
║ ║
║ xor rax, rax ║
║ ║
╠══-----==[ 2 ]==----------------------------------------------------------------══╣
║ ║
║ Loading values into registers is where we have our first couple of decisions to ║
║ make ║
║ ║
║ Firstly, do we want all relevant program information be handled internally by ║
║ the xor instructions? ║
║ ║
║ And secondly (when done externally), how pure do we want to get with "only" ║
║ using xor? ║
║ ║
║ Say the answer to the first question is "yes - all logic should be handled in ║
║ code as xor", then loading 64 into rax could look like: ║
║ ║
║ xor rax, rax ║
║ xor rax, 0x40 ║
║ ║
║ The issue with this is twofold, though it's nothing to do with it not being xor ║
║ ║
║ Firstly - this way of loading values into registers is more efficient that a mov ║
║ instruction, consequently then, this is a common way to see values loaded due to ║
║ compiler optimisation. Decompilers and seasoned reverse engineers alike can ║
║ recognise this pretty easily, and we can't have that ║
║ ║
║ It's also revealing with regard to our hardcoded values. If our "64" had some ║
║ inherent association (say it was a prime number used in crypto, or the address ║
║ of a C2 server) then simply seeing that value can sometimes be enough to infer ║
║ intent about a binary ║
║ ║
║ This leads us to the intuitive solution to handle the hardcoded values outside ║
║ our code (in our compiler) where we could choose a random number n, such that ║
║ 64 xor n = a: ║
║ ║
╠══------------------------------------------------------------------------------══╣
║ xor rax, rax # clear ║
║ xor rax, a # rax = [64||n] ║
║ xor rax, n # rax = [64||n||n] = 64 ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ This hides our 64 value, as it is xor-ed outside our logic. ║
║ ║
║ We could even designate a register as a "scratch register" - load a somewhat ║
║ random value into it - and use that as our "n" value. so loading two values, 64 ║
║ and 100 into memory could look like (sch = our scratch registers): ║
║ ║
╠══------------------------------------------------------------------------------══╣
║ INIT: ║
║ xor sch, sch # clear ║
║ xor sch, n # load our "random" number into scratch ║
║ ║
║ RETRIEVING HARDCODED VALUES: ║
║ xor rax, rax # clear ║
║ xor rax, [sch||64] # bracketed piece is calculated externally ║
║ xor rax, sch # rax = [64||n||n] = 64 ║
║ ║
║ xor sch, rax # scratch: [n||64] ║
║ ║
║ xor rbx, rbx # clear ║
║ xor rbx, [sch||100] # bracketed piece is calculated externally ║
║ xor rbx, sch # rbx now contains 100 ║
║ ║
║ xor sch, rbx # scratch: [n||64||100] ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ I am kind of sad about this as a solution, as it ruins some of the "purity" of ║
║ genuinely trying to do everything using xor, given that we are relying on doing ║
║ some non-xor calculations externally in our compiler (especially random numbers) ║
║ ║
║ Perhaps we could do this the other way round, where our scratch register ║
║ is an xored list of all of our constants, (say 64, 100, 80) and we retrieve the ║
║ value we want by xoring the list with all the constants we don’t want: ║
║ ║
╠══------------------------------------------------------------------------------══╣
║ INIT: ║
║ xor sch, sch ║
║ xor sch, [64||100||80] # calculated externally, as before ║
║ ║
║ RETRIEVING HARDCODED VALUES: ║
║ xor rax, rax ║
║ xor rax, sch ║
║ xor rax, [100||80] # rax now contains 64 ║
║ ║
║ xor rbx, rbx ║
║ xor rbx, sch ║
║ xor rbx, [64||80] # rcx now contains 100 ║
║ ║
║ xor rcx, rcx ║
║ xor rcx, sch ║
║ xor rcx, [100||64] # rcx now contains 80 ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ For the sake of obfuscation, I can live with this - at least the computational ║
║ work being done by our compiler is just xor. ║
║ ║
╠══-----==[ 3 ]==----------------------------------------------------------------══╣
║ ║
║ Here we run into our first (and pretty thorny issue), how do we manage a ║
║ conditional statement? ║
║ ║
║ Lets clarify here, I don't just mean any conditional - I mean a general purpose ║
║ conditional. Of course, xor is "conditional" in its own way: ║
║ ║
║ For values of (a and b) between (0 -> 1): ║
║ ║
║ a xor b = 1 or 0 ║
║ ║
║ There are "conditions" (a and b values) of a xor b that yield 1, and conditions ║
║ that yield 0. Our issue is that there are multiple conditions that satisfy this ║
║ ║
║ We want just one ║
║ ║
║ Lets try making a "detect if two registers are equal" function ║
║ ║
╠══------------------------------------------------------------------------------══╣
║ # let rax be the first register we want to compare ║
║ ║
║ # let rcx be the second register we want to compare ║
║ ║
║ # let inv be a register filled with 1s (invert register) ║
║ ║
║ xor sc1, sc1 ║
║ xor sc1, rax # copy rax into sc1 ║
║ ║
║ xor sc2, sc2 ║
║ xor sc2, rcx # copy rcx into sc2 ║
║ ║
║ xor rcx, inv # inverts the bits of rcx ║
║ ║
║ xor rax, rcx # xors the inverted rcx with rax ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ This would yield all 1s in rax, if rax and rcx were equal - and something else ║
║ if not. unfortunately this still isn't great, as the "failure" condition isn't ║
║ reliable (any combination of 1s and at least one 0) ║
║ ║
║ Turns out we can do some trickery with addressing, as if two registers contain ║
║ the same value then operating on the values contained at those addresses can ║
║ act as a test for whether they are the same or not: ║
║ ║
╠══------------------------------------------------------------------------------══╣
║ # we have registers rdx and rcx (which contain some values) ║
║ ║
║ xor [rdx], 0x00 # memory at rdx now contains 0 ║
║ ║
║ xor [rcx], 0x01 # memory at rcx now contains 1 ║
║ ║
║ xor rax, rax ║
║ xor rax, [rdx] # move the memory at our first write into rax ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ The trick here is that if rdx and rcx are the same - then the 0 was overwritten ║
║ by a 1, and if not - it remained a 0. This stores the value 1 in rax if both of ║
║ our registers were the same, and 0 if they were different ║
║ ║
║ Amusingly, I started to do a bunch of other stuff that relied on the above being ║
║ correct, without actually going back and checking my homework - so that'll teach ║
║ me... Thankfully the principle still holds, and conditionals are possible ║
║ ║
║ Lets fix it now, the problem is that when we: ║
║ ║
║ xor [rdx], 0x00 ║
║ ║
║ This actually does nothing, as a xor 0 = a. Okay, eaxy fix then - just xor [rdx] ║
║ with itself to make sure the memory address located at the value pointed to by ║
║ rdx is 0. However, the intel manual has something to say about this: ║
║ ║
║ "Two memory operands cannot be used in one instruction" ║
║ ║
║ So we can't do the intuitive: ║
║ ║
║ xor [rdx], [rdx] ║
║ ║
║ Instead we have to "realise" [rdx] to a scratch register, and then do it from ║
║ there: ║
║ ║
╠══------------------------------------------------------------------------------══╣
║ xor sch, sch ║
║ xor sch, [rdx] # loads the memory at address rdx into sch ║
║ ║
║ xor [rdx], sch # zeros out the memory at rdx ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ Which gives us the actually functional comparison for if two values are equal, ║
║ where rax == 1 if rdx and rcx are equal, and rax == 0 if they are not: ║
║ ║
╠══------------------------------------------------------------------------------══╣
║ # we have registers rdx and rcx (which contain some values) ║
║ ║
║ xor sch, sch ║
║ xor sch, [rdx] ║
║ xor [rdx], sch # memory at rdx now contains 0 ║
║ ║
║ xor sch, sch ║
║ xor sch, [rcx] ║
║ xor [rcx], sch # memory at rcx now contains 0 ║
║ xor [rcx], 0x1 # memory at rcx now contains 1 ║
║ ║
║ xor rax, rax ║
║ xor rax, [rdx] # move the memory at our first write into rax ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ We can rephrase this for comparison to any "n" aswell: ║
║ ║
╠══------------------------------------------------------------------------------══╣
║ xor sch, sch ║
║ xor sch, [rbx] ║
║ xor [rbx], sch ║
║ xor [rbx], 0x00 # memory at rbx now contains 0 ║
║ ║
║ xor sch, sch ║
║ xor sch, [n] ║
║ xor [n], sch ║
║ xor [n], 0x01 # memory at n now contains 1 ║
║ ║
║ xor rax, rax ║
║ xor rax, [rbx] # if rbx == n, rax == 1 else 0 ║
╠══------------------------------------------------------------------------------══╣
║ ║
║ Here we are checking the memory at the address pointed to by [n] which I imagine ║
║ would often crash my program (especially for n == 0 checks) - that’s a problem ║
║ for future CWW to work out :) ║
║ ║
║ Theoretically though, this works as a comparison statement, but how do we "do" ║
║ something based on rax == 1 or 0? ║
║ ║
║ We can place our values to "choose" from into two spots in memory (a and b) we ║
║ then can use our 1 or 0 value to select between them: ║
║ ║
╠══------------------------------------------------------------------------------══╣
║ VALUE A: ║
║ xor rcx, rcx ║
║ xor rcx, a ║
║ ║
║ VALUE B: ║
║ xor rdx, rdx ║
║ xor rdx, b ║
║ ║
║ COMPARISON OF RBX TO N: ║
║ xor sch, sch ║
║ xor sch, [rbx] ║
║ xor [rbx], sch ║
║ xor [rbx], 0x00 # memory at rbx now contains 0 ║
║ ║
║ xor sch, sch ║
║ xor sch, [n] ║
║ xor [n], sch ║
║ xor [n], 0x01 # memory at n now contains 1 ║
║ ║
║ xor rax, rax ║
║ xor rax, [rbx] # if rbx == n, rax == 1 else 0 ║
║ ║
║ LOOKUP INDEXED FROM X ONWARDS: ║
║ xor sch, sch ║
║ xor sch, [X] ║
║ xor [X], sch # memory at X now contains 0 ║
║ ║
║ xor sch, sch ║
║ xor sch, [X+1] ║
║ xor [X+1], sch # memory at X+1 now contains 0 ║
║ ║
║ xor [X], a # memory at X now contains a ║
║ ║
║ xor [X+1], b # memory at X+1 now contains b ║
║ ║
║ add X, rax # ummmm is that an add instruction... ║
║ ║
║ xor rax, rax ║
║ xor rax, [X] # rax contains a or b ║
╠══------------------------------------------------------------------------------══╣
║ ║
╠══-----==[ 4 ]==----------------------------------------------------------------══╣
║ ║
║ Can you see it too? there is a single add instruction in our comparison block ║
║ which kinda ruins the whole xor only thing... ║
║ ║
║ This is a thorny issue it seems, cause we are trying to make conditionals - but ║
║ we (here I mean just me) can't without a single add instruction, but most ways ║
║ of adding require conditionals, or other logic operations :( ║
║ ║
║ Lets take a look at the logic for addition of single binary digits: ║
║ ║
║ number 1 number 2 output ║
║ 01 01 -> 10 ║
║ 00 01 -> 01 ║
║ 01 00 -> 01 ║
║ 00 00 -> 00 ║
║ ║
║ Hey, that’s familiar - the second column is just the xor truth table from before.║
║ We can't just replace addition with xor though, as when the 0ths places both are ║
║ 1s, then the 1ths place becomes a 1. xor is addition mod(2) - or addition of ║
║ GF(2) ║
║ ║
║ In a binary adder circuit this is done through a combination of and/or logic ║
║ gates (this video on computing from first logic principles is great). So can we ║
║ create and/or logic operations using just xor? ║
║ ║
╠══-----==[ 5 ]==----------------------------------------------------------------══╣
║ ║
║ We can make a not logical operation by reusing our inversion trick from the ║
║ first failed comparison example. This will probably be useful in future ║
║ operations so we will assign a register to this value (inv = invert register): ║
║ ║
║ xor inv, inv ║
║ xor inv, 0xffffffffffffffff ║
║ ║
║ After many days of trying to work out what on earth I'm doing in boolean algebra ║
║ it turns out (I think?) that a combination of xor/not cannot produce or/and ║
║ :( ║
║ ║
║ The hope was that as xor has two alternative forms: ║
║ ║
║ -> disjunctive -> (a & ¬b) | (¬a & b) ║
║ -> conjunctive -> (a | b) & (¬a | ¬b) ║
║ ║
║ And that we can swap or/and operations (DeMorgan strikes again): ║
║ ║
║ -> ¬(a & b) = ¬a|¬b ║
║ -> ¬(a & b) = ¬a | ¬b ║
║ ║
║ There might be some sort of equation manipulation we could do to get (a & b) on ║
║ one side, and just xor/not on the other, but it doesn't seem possible. ║
║ ║
║ My tentative reason for this assumption is that every combination of xor/not can ║
║ be expressed as the following (as repeated xor cancels): ║
║ ║
║ for some constants: x, y, z e{0,1}: ║
║ f(a, b) = x || (y & a) || (z & b) ║
║ ║
║ So we have three constants, that can be either 1 or 0 for f(a, b): ║
║ ║
║ xyz f(0,0) f(0,1) f(1,0) f(1,1) reduces to... ║
║ ║
║ 000 0 0 0 0 constant 0 ║
║ 100 1 1 1 1 constant 1 ║
║ 010 0 0 1 1 a ║
║ 001 0 1 0 1 b ║
║ 110 1 1 0 0 not a ║
║ 011 0 1 1 0 a || b ║
║ 101 1 0 1 0 not b ║
║ 111 1 0 0 1 not (a || b) ║
║ ║
║ There is no way to detect two 1s - so we can't make an and/or operation (even ║
║ though xor "contains" them through their alternative forms. Big sad. ║
║ ║
║ Despite currently being stuck, I'm pretty happy with the direction this is going ║
║ - we are starting to generate some pretty unpleasant looking assembly code, ║
║ though I worry I've bitten off a little more than one meagre student can handle. ║
║ Time will tell ║
║ ║
║ CWW out ║
║ ║
║ PS. I now know about the M/o/Vfuscator, but I'd prefer to see how far I can go ║
║ myself before seeing how they implemented solutions to the problems I'm ║
║ encountering. ║
╚══════════════════════════════════════════════════════════════════════════════════╝
┏━━┓
BACK
┗━━┛