CWW/blog/CCARA_Part

┏━━┓
BACK
┗━━┛
╔══════════════════════════════════════════════════════════════════════════════════╗
║ Contents - 03-01-2026 committing crimes against readable assembly part 2, xor    ║
║ edition                                                                          ║
║ -------------------------------------------------------------------------------- ║
║ setting some ground-rules                                                        ║
║ xor info-dump                                                                    ║
║ clearing registers                                                               ║
║ loading and moving values around registers                                       ║
║ conditional flow                                                                 ║
║ basic arithmetic                                                                 ║
║ basic logic                                                                      ║
╚══════════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════════╗
║                                                                                  ║
║ So, where do we even begin?                                                      ║
║                                                                                  ║
║ If we are to make a complier for regular assembly, that only outputs xor         ║
║ instructions (that's the aim here btw) then we better be able to translate a     ║
║ decent chunk of the assembly instruction set into xor equivalents                ║
║                                                                                  ║
║ Here is the list of assembly instructions, in order of usage in some popular     ║
║ opensource programs (from here):                                                 ║
║                                                                                  ║
║         mov                   # 35%                                              ║
║         push                  # 10%                                              ║
║         call                  # 6%                                               ║
║         cmp                   # 5%                                               ║
║         add                   # 4%                                               ║
║         pop                   # 4%                                               ║
║         lea                   # 4%                                               ║
║         test                  # 3%                                               ║
║         je                    # 3%                                               ║
║         xor                   # 2%                                               ║
║         jmp                   # 2%                                               ║
║         jne                   # 2%                                               ║
║         ret                   # 1%                                               ║
║         inc                   # 1%                                               ║
║         sub                   # 1%                                               ║
║         fld                   # 1%                                               ║
║         and                   # 1%                                               ║
║         fstp                  # 1%                                               ║
║         shl                   # 1%                                               ║
║         or                    # 1%                                               ║
║         others                # 11%                                              ║
║                                                                                  ║
║ As a realistic aim, lets rule out some assembly instructions that I think might  ║
║ be either pretty tricky, or just require abstraction after we handle the more    ║
║ fundamental items:                                                               ║
║                                                                                  ║
║ -> Floating point stuff (fld, fstp) I don't even know how these work tbh...      ║
║ -> Anything that alters rip (ret, call, jmp, syscall, 0x80)                      ║
║ -> Stack addressing (push, pop) maybe later?                                     ║
║                                                                                  ║
║ This does mean we can't just pipe the output of your favourite complier into     ║
║ our xor program. So we'd potentially be limited to my handwritten assembly       ║
║                                                                                  ║
╠══════════════════════════════════════════════════════════════════════════════════╣
║                                                                                  ║
║ I do have to remind myself of the full range of options available to us from     ║
║ the xor instruction - if you are familiar with this please feel free to skip     ║
║ this section.                                                                    ║
║                                                                                  ║
║ The truth table for xor is as follows:                                           ║
║                                                                                  ║
║  A  xor  B  =   C                                                                ║
║  1       1  ->  0                                                                ║
║  1       0  ->  1                                                                ║
║  0       1  ->  1                                                                ║
║  0       0  ->  0                                                                ║
║                                                                                  ║
║ xor also has the following useful properties:                                    ║
║                                                                                  ║
║ x xor a = a xor x (order of operations doesn’t matter)                           ║
║ a xor a = 0                                                                      ║
║ a xor x xor a = x (commutative - xoring by the same value twice cancels out)     ║
║ x xor 0 = x                                                                      ║
║ a xor (b xor c) = (a xor b) xor c (associative)                                  ║
║                                                                                  ║
║ Here is the blurb from the x86_64 intel instruction manual for xor:              ║
║                                                                                  ║
╠══------------------------------------------------------------------------------══╣
║ Performs a bitwise exclusive OR (xor) operation on the destination (first) and   ║
║ source (second) operands and stores the result in the destination operand        ║
║ location. The source operand can be an immediate, a register, or a memory        ║
║ location; the destination operand can be a register or a memory location.        ║
║ (However, two memory operands cannot be used in one instruction.) Each bit of    ║
║ the result is 1 if the corresponding bits of the operands are different; each    ║
║ bit is 0 if the corresponding bits are the same. This instruction can be used    ║
║ with a LOCK prefix to allow the instruction to be executed atomically.           ║
╠══------------------------------------------------------------------------------══╣
║                                                                                  ║
║ So that allows us to play with not only direct values, but registers and         ║
║ intermediate values (-128 -> 127 as well)                                        ║
║                                                                                  ║
╠══════════════════════════════════════════════════════════════════════════════════╣
║                                                                                  ║
║ If we can work out the following sorts of instructions we ought to be well on    ║
║ our way:                                                                         ║
║                                                                                  ║
║ -> 1 clearing a register                                                         ║
║ -> 2 loading and moving values around registers                                  ║
║ -> 3 conditional flow                                                            ║
║ -> 4 basic arithmetic                                                            ║
║ -> 5 basic logic                                                                 ║
║                                                                                  ║
╠══-----==[ 1 ]==----------------------------------------------------------------══╣
║                                                                                  ║	
║ Our first objective is pretty easy, if a number x is xored with itself, it       ║
║ always yields 0                                                                  ║
║                                                                                  ║
║ So a "clear" instruction like:                                                   ║
║                                                                                  ║
║       mov rax, 0x00                                                              ║
║                                                                                  ║
║ Becomes:                                                                         ║
║                                                                                  ║
║       xor rax, rax                                                               ║
║                                                                                  ║
╠══-----==[ 2 ]==----------------------------------------------------------------══╣
║                                                                                  ║
║ Loading values into registers is where we have our first couple of decisions to  ║
║ make                                                                             ║
║                                                                                  ║
║ Firstly, do we want all relevant program information be handled internally by    ║
║ the xor instructions?                                                            ║
║                                                                                  ║
║ And secondly (when done externally), how pure do we want to get with "only"      ║
║ using xor?                                                                       ║
║                                                                                  ║
║ Say the answer to the first question is "yes - all logic should be handled in    ║
║ code as xor", then loading 64 into rax could look like:                          ║
║                                                                                  ║
║       xor rax, rax                                                               ║
║       xor rax, 0x40                                                              ║
║                                                                                  ║
║ The issue with this is twofold, though it's nothing to do with it not being xor  ║
║                                                                                  ║
║ Firstly - this way of loading values into registers is more efficient that a mov ║
║ instruction, consequently then, this is a common way to see values loaded due to ║
║ compiler optimisation. Decompilers and seasoned reverse engineers alike can      ║
║ recognise this pretty easily, and we can't have that                             ║
║                                                                                  ║
║ It's also revealing with regard to our hardcoded values. If our "64" had some    ║
║ inherent association (say it was a prime number used in crypto, or the address   ║
║ of a C2 server) then simply seeing that value can sometimes be enough to infer   ║
║ intent about a binary                                                            ║
║                                                                                  ║
║ This leads us to the intuitive solution to handle the hardcoded values outside   ║
║ our code (in our compiler) where we could choose a random number n, such that    ║
║ 64 xor n = a:                                                                    ║
║                                                                                  ║
╠══------------------------------------------------------------------------------══╣
║       xor rax, rax            # clear                                            ║
║       xor rax, a              # rax = [64||n]                                    ║
║       xor rax, n              # rax = [64||n||n] = 64                            ║
╠══------------------------------------------------------------------------------══╣
║                                                                                  ║
║ This hides our 64 value, as it is xor-ed outside our logic.                      ║
║                                                                                  ║
║ We could even designate a register as a "scratch register" - load a somewhat     ║
║ random value into it - and use that as our "n" value. so loading two values, 64  ║
║ and 100 into memory could look like (sch = our scratch registers):               ║
║                                                                                  ║
╠══------------------------------------------------------------------------------══╣
║ INIT:                                                                            ║
║       xor sch, sch            # clear                                            ║
║       xor sch, n              # load our "random" number into scratch            ║
║                                                                                  ║
║ RETRIEVING HARDCODED VALUES:                                                     ║
║       xor rax, rax            # clear                                            ║
║       xor rax, [sch||64]      # bracketed piece is calculated externally         ║
║       xor rax, sch            # rax = [64||n||n] = 64                            ║
║                                                                                  ║
║       xor sch, rax            # scratch: [n||64]                                 ║
║                                                                                  ║
║       xor rbx, rbx            # clear                                            ║
║       xor rbx, [sch||100]     # bracketed piece is calculated externally         ║
║       xor rbx, sch            # rbx now contains 100                             ║
║                                                                                  ║
║       xor sch, rbx            # scratch: [n||64||100]                            ║
╠══------------------------------------------------------------------------------══╣
║                                                                                  ║
║ I am kind of sad about this as a solution, as it ruins some of the "purity" of   ║
║ genuinely trying to do everything using xor, given that we are relying on doing  ║
║ some non-xor calculations externally in our compiler (especially random numbers) ║
║                                                                                  ║
║ Perhaps we could do this the other way round, where our scratch register         ║
║ is an xored list of all of our constants, (say 64, 100, 80) and we retrieve the  ║
║ value we want by xoring the list with all the constants we don’t want:           ║
║                                                                                  ║
╠══------------------------------------------------------------------------------══╣
║ INIT:                                                                            ║
║       xor sch, sch                                                               ║
║       xor sch, [64||100||80]  # calculated externally, as before                 ║
║                                                                                  ║
║ RETRIEVING HARDCODED VALUES:                                                     ║
║       xor rax, rax                                                               ║
║       xor rax, sch                                                               ║
║       xor rax, [100||80]      # rax now contains 64                              ║
║                                                                                  ║
║       xor rbx, rbx                                                               ║
║       xor rbx, sch                                                               ║
║       xor rbx, [64||80]       # rcx now contains 100                             ║
║                                                                                  ║
║       xor rcx, rcx                                                               ║
║       xor rcx, sch                                                               ║
║       xor rcx, [100||64]      # rcx now contains 80                              ║
╠══------------------------------------------------------------------------------══╣
║                                                                                  ║
║ For the sake of obfuscation, I can live with this - at least the computational   ║
║ work being done by our compiler is just xor.                                     ║
║                                                                                  ║
╠══-----==[ 3 ]==----------------------------------------------------------------══╣
║                                                                                  ║
║ Here we run into our first (and pretty thorny issue), how do we manage a         ║
║ conditional statement?                                                           ║
║                                                                                  ║
║ Lets clarify here, I don't just mean any conditional - I mean a general purpose  ║
║ conditional. Of course, xor is "conditional" in its own way:                     ║
║                                                                                  ║
║ For values of (a and b) between (0 -> 1):                                        ║
║                                                                                  ║
║       a xor b = 1 or 0                                                           ║
║                                                                                  ║
║ There are "conditions" (a and b values) of a xor b that yield 1, and conditions  ║
║ that yield 0. Our issue is that there are multiple conditions that satisfy this  ║
║                                                                                  ║
║ We want just one                                                                 ║
║                                                                                  ║
║ Lets try making a "detect if two registers are equal" function                   ║
║                                                                                  ║
╠══------------------------------------------------------------------------------══╣
║       # let rax be the first register we want to compare                         ║
║                                                                                  ║
║       # let rcx be the second register we want to compare                        ║
║                                                                                  ║
║       # let inv be a register filled with 1s (invert register)                   ║
║                                                                                  ║
║       xor sc1, sc1                                                               ║
║       xor sc1, rax            # copy rax into sc1                                ║
║                                                                                  ║
║       xor sc2, sc2                                                               ║
║       xor sc2, rcx            # copy rcx into sc2                                ║
║                                                                                  ║
║       xor rcx, inv            # inverts the bits of rcx                          ║
║                                                                                  ║
║       xor rax, rcx            # xors the inverted rcx with rax                   ║
╠══------------------------------------------------------------------------------══╣
║                                                                                  ║
║ This would yield all 1s in rax, if rax and rcx were equal - and something else   ║
║ if not. unfortunately this still isn't great, as the "failure" condition isn't   ║
║ reliable (any combination of 1s and at least one 0)                              ║
║                                                                                  ║
║ Turns out we can do some trickery with addressing, as if two registers contain   ║
║ the same value then operating on the values contained at those addresses can     ║
║ act as a test for whether they are the same or not:                              ║
║                                                                                  ║
╠══------------------------------------------------------------------------------══╣
║       # we have registers rdx and rcx (which contain some values)                ║
║                                                                                  ║
║       xor [rdx], 0x00         # memory at rdx now contains 0                     ║
║                                                                                  ║
║       xor [rcx], 0x01         # memory at rcx now contains 1                     ║
║                                                                                  ║
║       xor rax, rax                                                               ║
║       xor rax, [rdx]          # move the memory at our first write into rax      ║
╠══------------------------------------------------------------------------------══╣
║                                                                                  ║
║ The trick here is that if rdx and rcx are the same - then the 0 was overwritten  ║
║ by a 1, and if not - it remained a 0. This stores the value 1 in rax if both of  ║
║ our registers were the same, and 0 if they were different                        ║
║                                                                                  ║
║ Amusingly, I started to do a bunch of other stuff that relied on the above being ║
║ correct, without actually going back and checking my homework - so that'll teach ║
║ me... Thankfully the principle still holds, and conditionals are possible        ║
║                                                                                  ║
║ Lets fix it now, the problem is that when we:                                    ║
║                                                                                  ║
║       xor [rdx], 0x00                                                            ║
║                                                                                  ║
║ This actually does nothing, as a xor 0 = a. Okay, eaxy fix then - just xor [rdx] ║
║ with itself to make sure the memory address located at the value pointed to by   ║
║ rdx is 0. However, the intel manual has something to say about this:             ║
║                                                                                  ║
║ "Two memory operands cannot be used in one instruction"                          ║
║                                                                                  ║
║ So we can't do the intuitive:                                                    ║
║                                                                                  ║
║       xor [rdx], [rdx]                                                           ║
║                                                                                  ║
║ Instead we have to "realise" [rdx]  to a scratch register, and then do it from   ║
║ there:                                                                           ║
║                                                                                  ║
╠══------------------------------------------------------------------------------══╣
║       xor sch, sch                                                               ║
║       xor sch, [rdx]          # loads the memory at address rdx into sch         ║
║                                                                                  ║
║       xor [rdx], sch          # zeros out the memory at rdx                      ║
╠══------------------------------------------------------------------------------══╣
║                                                                                  ║
║ Which gives us the actually functional comparison for if two values are equal,   ║
║ where rax == 1 if rdx and rcx are equal, and rax == 0 if they are not:           ║
║                                                                                  ║
╠══------------------------------------------------------------------------------══╣
║       # we have registers rdx and rcx (which contain some values)                ║
║                                                                                  ║
║       xor sch, sch                                                               ║
║       xor sch, [rdx]                                                             ║
║       xor [rdx], sch          # memory at rdx now contains 0                     ║
║                                                                                  ║
║       xor sch, sch                                                               ║
║       xor sch, [rcx]                                                             ║
║       xor [rcx], sch          # memory at rcx now contains 0                     ║
║       xor [rcx], 0x1          # memory at rcx now contains 1                     ║
║                                                                                  ║
║       xor rax, rax                                                               ║
║       xor rax, [rdx]          # move the memory at our first write into rax      ║
╠══------------------------------------------------------------------------------══╣
║                                                                                  ║
║ We can rephrase this for comparison to any "n" aswell:                           ║
║                                                                                  ║
╠══------------------------------------------------------------------------------══╣
║       xor sch, sch                                                               ║
║       xor sch, [rbx]                                                             ║
║       xor [rbx], sch                                                             ║
║       xor [rbx], 0x00          # memory at rbx now contains 0                    ║
║                                                                                  ║
║       xor sch, sch                                                               ║
║       xor sch, [n]                                                               ║
║       xor [n], sch                                                               ║
║       xor [n], 0x01            # memory at n now contains 1                      ║
║                                                                                  ║
║       xor rax, rax                                                               ║
║       xor rax, [rbx]           # if rbx == n, rax == 1 else 0                    ║
╠══------------------------------------------------------------------------------══╣
║                                                                                  ║
║ Here we are checking the memory at the address pointed to by [n] which I imagine ║
║ would often crash my program (especially for n == 0 checks) - that’s a problem   ║
║ for future CWW to work out :)                                                    ║
║                                                                                  ║
║ Theoretically though, this works as a comparison statement, but how do we "do"   ║
║ something based on rax == 1 or 0?                                                ║
║                                                                                  ║
║ We can place our values to "choose" from into two spots in memory (a and b) we   ║
║ then can use our 1 or 0 value to select between them:                            ║
║                                                                                  ║
╠══------------------------------------------------------------------------------══╣
║ VALUE A:                                                                         ║
║       xor rcx, rcx                                                               ║
║       xor rcx, a                                                                 ║
║                                                                                  ║
║ VALUE B:                                                                         ║
║       xor rdx, rdx                                                               ║
║       xor rdx, b                                                                 ║
║                                                                                  ║
║ COMPARISON OF RBX TO N:                                                          ║
║       xor sch, sch                                                               ║
║       xor sch, [rbx]                                                             ║
║       xor [rbx], sch                                                             ║
║       xor [rbx], 0x00         # memory at rbx now contains 0                     ║
║                                                                                  ║
║       xor sch, sch                                                               ║
║       xor sch, [n]                                                               ║
║       xor [n], sch                                                               ║
║       xor [n], 0x01           # memory at n now contains 1                       ║
║                                                                                  ║
║       xor rax, rax                                                               ║
║       xor rax, [rbx]          # if rbx == n, rax == 1 else 0                     ║
║                                                                                  ║
║ LOOKUP INDEXED FROM X ONWARDS:                                                   ║
║       xor sch, sch                                                               ║
║       xor sch, [X]                                                               ║
║       xor [X], sch            # memory at X now contains 0                       ║
║                                                                                  ║
║       xor sch, sch                                                               ║
║       xor sch, [X+1]                                                             ║
║       xor [X+1], sch          # memory at X+1 now contains 0                     ║
║                                                                                  ║
║       xor [X], a              # memory at X now contains a                       ║
║                                                                                  ║
║       xor [X+1], b            # memory at X+1 now contains b                     ║
║                                                                                  ║
║       add X, rax              # ummmm is that an add instruction...              ║
║                                                                                  ║
║       xor rax, rax                                                               ║
║       xor rax, [X]            # rax contains a or b                              ║
╠══------------------------------------------------------------------------------══╣
║                                                                                  ║
╠══-----==[ 4 ]==----------------------------------------------------------------══╣
║                                                                                  ║
║ Can you see it too? there is a single add instruction in our comparison block    ║
║ which kinda ruins the whole xor only thing...                                    ║
║                                                                                  ║
║ This is a thorny issue it seems, cause we are trying to make conditionals - but  ║
║ we (here I mean just me) can't without a single add instruction, but most ways   ║
║ of adding require conditionals, or other logic operations :(                     ║
║                                                                                  ║
║ Lets take a look at the logic for addition of single binary digits:              ║
║                                                                                  ║
║ number 1   number 2   output                                                     ║
║   01         01   ->    10                                                       ║
║   00         01   ->    01                                                       ║
║   01         00   ->    01                                                       ║
║   00         00   ->    00                                                       ║
║                                                                                  ║
║ Hey, that’s familiar - the second column is just the xor truth table from before.║
║ We can't just replace addition with xor though, as when the 0ths places both are ║
║ 1s, then the  1ths place becomes a  1. xor is addition mod(2) - or addition of   ║
║ GF(2)                                                                            ║
║                                                                                  ║
║ In a binary adder circuit this is done through a combination of and/or logic     ║
║ gates (this video on computing from first logic principles is great). So can we  ║
║ create and/or logic operations using just xor?                                   ║
║                                                                                  ║
╠══-----==[ 5 ]==----------------------------------------------------------------══╣
║                                                                                  ║
║ We can make a not logical operation by reusing our inversion trick from the      ║
║ first failed comparison example. This will probably be useful in future          ║
║ operations so we will assign a register to this value (inv = invert register):   ║
║                                                                                  ║
║       xor inv, inv                                                               ║
║       xor inv, 0xffffffffffffffff                                                ║
║                                                                                  ║
║ After many days of trying to work out what on earth I'm doing in boolean algebra ║
║ it turns out (I think?) that a combination of xor/not cannot produce or/and      ║
║ :(                                                                               ║
║                                                                                  ║
║ The hope was that as xor has two alternative forms:                              ║
║                                                                                  ║
║ -> disjunctive -> (a & ¬b) | (¬a & b)                                            ║
║ -> conjunctive -> (a | b) & (¬a | ¬b)                                            ║
║                                                                                  ║
║ And that we can swap or/and operations (DeMorgan strikes again):                 ║
║                                                                                  ║
║ -> ¬(a & b) = ¬a|¬b                                                              ║
║ -> ¬(a & b) = ¬a | ¬b                                                            ║
║                                                                                  ║
║ There might be some sort of equation manipulation we could do to get (a & b) on  ║
║ one side, and just xor/not on the other, but it doesn't seem possible.           ║
║                                                                                  ║
║ My tentative reason for this assumption is that every combination of xor/not can ║
║ be expressed as the following (as repeated xor cancels):                         ║
║                                                                                  ║
║       for some constants: x, y, z e{0,1}:                                        ║
║       f(a, b) = x || (y & a) || (z & b)                                          ║
║                                                                                  ║
║ So we have three constants, that can be either 1 or 0 for f(a, b):               ║
║                                                                                  ║
║ xyz     f(0,0)     f(0,1)     f(1,0)     f(1,1)     reduces to...                ║
║                                                                                  ║
║ 000     0          0          0          0          constant 0                   ║
║ 100     1          1          1          1          constant 1                   ║
║ 010     0          0          1          1          a                            ║
║ 001     0          1          0          1          b                            ║
║ 110     1          1          0          0          not a                        ║
║ 011     0          1          1          0          a || b                       ║
║ 101     1          0          1          0          not b                        ║
║ 111     1          0          0          1          not (a || b)                 ║
║                                                                                  ║
║ There is no way to detect two 1s - so we can't make an and/or operation (even    ║
║ though xor "contains" them through their alternative forms. Big sad.             ║
║                                                                                  ║
║ Despite currently being stuck, I'm pretty happy with the direction this is going ║
║  - we are starting to generate some pretty unpleasant looking assembly code,     ║
║ though I worry I've bitten off a little more than one meagre student can handle. ║
║ Time will tell                                                                   ║
║                                                                                  ║
║ CWW out                                                                          ║
║                                                                                  ║
║ PS. I now know about the M/o/Vfuscator, but I'd prefer to see how far I can go   ║
║ myself before seeing how they implemented solutions to the problems I'm          ║
║ encountering.                                                                    ║
╚══════════════════════════════════════════════════════════════════════════════════╝
┏━━┓
BACK
┗━━┛