TL;DR🔗
- Simple binary patching and experimentation can be done with
wxHexEditor
. - Erlang and Elixir are the only general-purpose programming languages that allow for binary pattern-matching, and should be used for functional and declarative binary editing.
- GNU poke is a domain-specific language for "poking at binaries". Different binary units (bits, bytes, uints, ints, structs) are supported natively and a genius mapping operator removes a lot of boilerplate when it comes to reading data into semantic structures and writing data out to memory or files.
There are different ways to structure data within a file.
In the age of web, the most prominent one is JSON, followed by middle-aged XML and configuration-oriented YAML.
Human-readable ways to structure data, however, are just one side of the spectrum.
The other side is pure binary data mapped to some data structures to be computer-readable.
Features like bit fields in C enable humans to define such mappings, but they also serve as a way to decode said structures.
Wiki example, when compiled and disassembled doesn't mention BoxProps
at all.
Only when we add a function like the one below is included in the source code, would compiler spare some instructions to figure out where to get desired bits from.
bool getShowBorder(BoxProps x) {
return x.show_border;
}
Packing bits into structures warrants a whole separate post, but it's worth mentioning that bitfields is a controversial feature of C99 and many C developers suggest not using it in favor of using bitmaps and manually written packing / unpacking functions. One of the reasons for it is that it's not defined how the compiler packs bitfields, which means in practice, that some compilers reorder under-octet bitfields for the sake of optimisations.
Traditionally, non-human readable data is explored using hexdump
or, less commonly xxd
to print as binary instead of hex.
Finally, there is od
which defaults to decimal representation.
But what if we need to edit it slightly for experimentation, error correction or binary carving?
There is no single traditional tool for binary editing, but there's a range of tools, collectively called hex editors.
Shortcomings of hex editors🔗
Sadly, most of readily available hex editors have two major shortcomings:
- They only work on bytes
- They rarely handle huge binary files in a responsive way
- Some of hex editors have stability issues, i.e. they crash a lot
The worst offenders for crashes are ghex and bless, with latter barely working at all. Which is funny, because these two editors are frequently suggested as the best out there.
Due to these circumstances, it's wise to stick to wxHexEditor
for very basic inspection/interaction and neglect hex editors altogether when it comes to advanced processing.
Hands on!🔗
Let's say we want to extract 7-bit ASCII from this file. To do this we need to take the last bit of every byte and shove it in the back of this binary. Here is a single-take attempt to do it with a hex editor:
Even if one knows what to do, it's less than trivial to not screw it up, especially while trying to save time on insertions of zeros. However, a hex editor is a great instrument to quickly validate that the method works and move on to more automated tools.
Binary processing with Erlang🔗
My favorite tool to do binary processing with is Erlang. It has a minimalist Prolog-like syntax, but most importantly, native support to work with binaries, bitstrings, including binary pattern matching (see "Examples" subsection).
To do with Erlang what we failed to do with hex editor, we will write the following program, following our hex editor attempt:
send_guest_bits_home(
<<A:7, G:1, Rest/binary>>, % (E)
{Hosts, Guests} % (A)
) ->
send_guest_bits_home(
Rest,
{ <<Hosts/bitstring, 0:1, A:7>> % (B)
, case Guests of % (F)
<<_/binary>> ->
<<Guests/binary, 0:1, G:1>>; % (C)
<<_/bitstring>> ->
<<Guests/bitstring, G:1>> % (D)
end });
We call seven-bit clusters "Hosts", and the appended bits, that we need to send to the back "Guests".
Here we use standard recursion with accumulator, which in our case is a tuple of Hosts
and Guests
.
We bind current values of accumulator in (A)
and update them in (B)
and (C)
/(D)
.
In (E)
, we match first byte of challenge with variables A
(7 bits), and G
(1 bit), and the rest of challenge gets matched with variable Rest
.
As you can see, in this context X:N
syntax means "match N
bits of some binary value with variable X
".
As opposed to a similar syntax you can see in (B)
and (C)
, where syntax 0:1
means "span decimal value 0
over 1
bits".
We specify that Rest
variable is binary
, not bitstring
, to make sure it's byte-aligned.
If it was bitstring
, it would mean that it may be any amount of bits large, even seven or one.
Hosts
update happens in (B)
is that we take seven bits, pad it with a single 0
on the left and shove into the end of the first element of accumulator.
Case statement in (F)
checks if Guests
accumulator variable is byte-aligned (its bit size is divisible by eight) or not.
If it is, next iteration of Guests
accumulator will be padded with a 0-bit, as seen in (C)
.
If it isn't we just append the "guest bit" to Guests
accumulator verbatim, as seen in (D)
.
send_guest_bits_home(
<<>>, % (A)
{Hosts, Guests}
) ->
<<Hosts/bitstring, Guests/bitstring>>.
Finally, in (A)
, when the remaining set of bits is empty, we return concatenation of the two accumulator variables.
solve() ->
{ok, Chal} =
file:read_file(
"LastBitOfEveryByteIsAGuest.ASCII"
),
Out =
send_guest_bits_home(
Chal, {<<>>, <<>>}
),
file:write_file(
"LastBitOfEveryByteIsAGuest.txt",
Out
).
When we run it, we get the hidden message!
$ erl
Erlang/OTP 22 [erts-10.6.4] [source] [64-bit] ...
Eshell V10.6.4 (abort with ^G)
1> c(solve).
solve.erl:2: Warning: export_all flag enabled ...
{ok,solve}
2> solve:solve().
ok
3>
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
(v)ersion (k)ill (D)b-tables (d)istribution
a
Thu Mar 04 01:27:26:530089000 sweater@conflagrate ~/gnu/poke-1.0/build
$ cat LastBitOfEveryByteIsAGuest.txt
doma{w4rm357_w3lcom3___t0___th15___bl0g}
At the time of publishing of this post I thought that there is no way to express binary literals in Erlang, like in Python 0b101 == 5
.
However, as Reddit user Gwaerondor pointed out, it's possible with Radix#
-syntax: 2#101 == 5
.
GNU poke🔗
In 2017, thirty one years after Erlang's initial prototype, a tool whose sole purpose is working with structured binary data was conceived. It's called GNU poke, and its v1 got released just several days ago. Let's see how to use it for the same transformation:
type I = struct {
uint<7> host;
uint<1> guest;
};
One of the biggest powers of GNU poke is that you can define a structure and map it to some data in one swift motion with "map" operator @
:
$ ./poke/poke LastBitOfEveryByteIsAGuest.ASCII
* snip *
(poke) I[] @ 0#B
[I {host=(uint<7>) 100,guest=(uint<1>) 1},
...,
I {host=(uint<7>) 95,guest=(uint<1>) 1}]
I[] @ 0#B
means "map the contents of currently loaded IO space onto array of I
s".
Now let's see how to write the solve script.
fun solve = (I[] xs) bit: {
var fh =
open("hosts.out", IOS_F_READ | IOS_F_WRITE); /* (D) */
var fg =
open("guests.out", IOS_F_READ | IOS_F_WRITE); /* (E) */
var bitsWrote = 0;
var bytesWrote = 0;
for (i in xs) {
if (bitsWrote % 8 == 0) {
bit[] @ fg : bitsWrote#b =
[(0 as bit), i.guest]; /* (B) */
bitsWrote += 2;
} else {
bit[] @ fg : bitsWrote#b =
[i.guest]; /* (C) */
bitsWrote += 1;
}
byte[] @ fh : bytesWrote#B =
[ (0 as bit):::i.host ]; /* (A) */
bytesWrote += 1;
}
close(fh);
close(fg);
return 0;
}
An interesting part of this function is (A)
, it uses binary concatenation operator :::
. It can concatenate heterogenous binary data types like a bit and a 7-bit blob.
Then, (B)
and (C)
write out guest bits, following the same logic as in Erlang implementation.
This is a pretty dense construction though, so let's read it together.
bit[] @ fg : bitsWrote#b
reads "map an array of bits to fg
with a bit offset of bitsWrote
".
Now on the right hand side there is a value that will get mapped, which in (B)
is two bits: 0
-padding and the guest bit and in (C)
just the guest bit.
Now the sadder bit is that (D)
and (E)
: these files need to exist and contain enough data to map the outputs of the program, otherwise when ran, one'll get EOF exception.
Here's how to run this solution:
Fri Mar 05 23:10:22:029970500 sweater@conflagrate ~/gnu/poke-1.0/build
$ echo -n "00000000000000000000000000000000000000000000000" > hosts.out
Fri Mar 05 23:10:36:841098700 sweater@conflagrate ~/gnu/poke-1.0/build
$ echo -n "00000000000000000000000000000000000000000000000" > guests.out
Fri Mar 05 23:10:40:652833300 sweater@conflagrate ~/gnu/poke-1.0/build
$ ./poke/poke LastBitOfEveryByteIsAGuest.ASCII
_____
---' __\_______
______) GNU poke 1.0
__)
__)
---._______)
Copyright (C) 2019-2021 The poke authors.
License GPLv3+: GNU GPL version 3 or later ...
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Powered by Jitter 0.9.258.
Perpetrated by Jose E. Marchesi.
For help, type ".help".
Type ".exit" to leave the program.
(poke) .load solve.poke
(poke) var xs = I[] @ 0#B
(poke) solve(xs)
(uint<1>) 0
(poke)
Fri Mar 05 23:11:25:337643300 sweater@conflagrate ~/gnu/poke-1.0/build
$ cat hosts.out
doma{w4rm357_w3lcom3___t0___th15___000000000000
Fri Mar 05 23:11:30:513672000 sweater@conflagrate ~/gnu/poke-1.0/build
$ cat guests.out
bl0g}000000000000000000000000000000000000000000
Picking the right tool at the right time🔗
In conclusion, it seems like it's wise to use
- hex editors allow for quick manual exploration,
- erlang allows to follow functional style better,
- GNU poke is imperative and procedure-oriented, but gives
- more flexibility in literal representations
- allows for less boilerplate due to genius mapping operator
Of course, you can use all three at different stages of your work with binaries.
Good luck with binary exploration and happy hacking!