Mooosl

Category: Pwn Points: 177 (18 solves) Author: ? Description: check my new baby toy! (Ubuntu 21.04: apt install musl)

I did not solve this challenge during the CTF. After reading about some techniques others applied (mentioned below) I finished my solution two days later.

I am going to try to go through this challenge with as much detail as possible. Perhaps a painful amount of detail. If you just want to see the exploitation technique, you probably want to scroll down to Part 5.

Part 1. Obtain and Run the Challenge

They told us to apt install musl, so we can google “apt musl” to find out this is installing the “musl” libc implementation. This also explains the challenge name.

Okay, so they gave us two files:

libc.so (presumably, this is the same libc that you get by doing apt install musl)

mooosl

$ file mooosl
mooosl: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-x86_64.so.1, stripped

Ok, so it’s a 64-bit x86 executable, it’s dynamically linked, and will use /lib/ld-musl-x86_64.so.1 as the linker. (Luckily, apt install musl also installed the linker in that exact location.)

I’m also going to run checksec to see if it was compiled with all the security mitigations or not.

$ checksec mooosl
[*] '/home/andrew/moosl/mooosl'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled

OK - so all protections enabled. This is DEFCON Quals, after all.

To check my sanity, I want to make sure the libc that is getting loaded when the process runs is the same as the libc that they gave us.

So I ran gdb ./mooosl and then typed run and then pressed CTRL-C to interrupt the program, and then info proc mappings to get a list of memory mappings:

gef> info proc mappings
process 7409
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
      0x555555554000     0x555555555000     0x1000        0x0 /home/andrew/moosl/mooosl
      0x555555555000     0x555555556000     0x1000     0x1000 /home/andrew/moosl/mooosl
      0x555555556000     0x555555557000     0x1000     0x2000 /home/andrew/moosl/mooosl
      0x555555557000     0x555555558000     0x1000     0x2000 /home/andrew/moosl/mooosl
      0x555555558000     0x555555559000     0x1000     0x3000 /home/andrew/moosl/mooosl
      0x555555559000     0x555555561000     0x8000        0x0 [heap]
      0x555555561000     0x555555562000     0x1000        0x0 [heap]
      0x555555562000     0x555555563000     0x1000        0x0 [heap]
      0x7ffff7f41000     0x7ffff7f45000     0x4000        0x0 [vvar]
      0x7ffff7f45000     0x7ffff7f47000     0x2000        0x0 [vdso]
      0x7ffff7f47000     0x7ffff7f5c000    0x15000        0x0 /usr/lib/x86_64-linux-musl/libc.so
      0x7ffff7f5c000     0x7ffff7fc3000    0x67000    0x15000 /usr/lib/x86_64-linux-musl/libc.so
      0x7ffff7fc3000     0x7ffff7ffa000    0x37000    0x7c000 /usr/lib/x86_64-linux-musl/libc.so
      0x7ffff7ffa000     0x7ffff7ffb000     0x1000    0xb2000 /usr/lib/x86_64-linux-musl/libc.so
      0x7ffff7ffb000     0x7ffff7ffc000     0x1000    0xb3000 /usr/lib/x86_64-linux-musl/libc.so

Okay, so it’s getting libc from /usr/lib/x86_64-linux-musl/libc.so.

$ md5sum libc.so
0cb0503eb60496782d21cf9b26d61c01  libc.so

$ md5sum /lib/x86_64-linux-musl/libc.so
40dcd7cec2b87e3f65a6b96ec6d383d3  /lib/x86_64-linux-musl/libc.so

Uh oh, the libc I got with apt install musl is not the same as the libc they gave us. This is probably because I’m on Ubuntu 20.10 and not Ubuntu 21.04. We want to have the exact same environment as the remote server, otherwise our exploit may work locally but not remotely. I tried a couple of tricks (LD_PRELOAD, LD_LIBRARY_PATH) to get it to load the libc out of the current directory instead, but I was unsuccessful. Since this is the only thing using musl libc on this system and I’m lazy, I just copied the libc they gave us over the one from apt:

cp libc.so /usr/lib/x86_64-linux-musl/libc.so

OK great, we can now run the program and are confident we have the correct libc version.

Part 2. Understand the Challenge

We are greeted with a prompt which gives us a number of options.

$ ./mooosl 
1: store
2: query
3: delete
4: exit
option: 

Those of us who are not strangers to pwn know this reeks of a heap challenge. (Why? ‘store’ => malloc, ‘delete’ => free, ‘query’ => print out the contents of an allocation).

I tried a few inputs, and made some observations:

$ ./mooosl 
1: store
2: query
3: delete
4: exit
option: 1
key size: 10
key content: ffffff
value size: 60
value content: FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
ok
1: store
2: query
3: delete
4: exit
option: invalid
$ FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF: command not found

It seems to be some kind of key-value store. A hash table? what happens on collisions??
It asks for size of the key and value, and if you input fewer bytes, it stops on newline
If you input too may bytes it doesn’t read them in (it tried to read my Fs as the next “option:” input instead)

$ ./mooosl 
1: store
2: query
3: delete
4: exit
option: 1
key size: 2
key content: f
value size: 10
value content: lololol
ok
1: store
2: query
3: delete
4: exit
option: 2
key size: 2
key content: f
0x7:6c6f6c6f6c6f6c
ok
1: store
2: query
3: delete
4: exit
option: 1
key size: 2
key content: f
value size: 10
value content: yyyyyyy
ok
1: store
2: query
3: delete
4: exit
option: 2
key size: 2
key content: f
0x7:79797979797979
ok
1: store
2: query
3: delete
4: exit
option: 4
bye

It prints my input in hex (0x:), size was the # of bytes actually read, not the number I entered
I stored two things for the same key. When I did query, I got the second one I stored.

That’s about all I cared to explore dynamically for the moment – let’s pop it into Ghidra.

Part 3. Find the bug

Find main. It looks like this:

The first call prints the menu. The second call (FUN_0010128f) calls some function and then does atoi on it (atoi reads a string and returns the decimal integer that was in it), so it’s probably reading the option selected by the user.

Ghidra has the wrong function signature for FUN_0010128f; we notice this because the return value of atoi is unused and when main calls this function it uses its return value. Edit the function signature to correct the return type, for more accurate decompilation.

With this knowledge, we know that iVar1 contains the option entered by the user. So we can label the different branches accordingly.

Let’s dive into do_store.

The first thing store does is calloc 0x30 bytes (calloc is just malloc, but it zeroes the region before you get it) - the pointer goes in puVar2. Then, we see three function calls and a bunch of operations relative to puVar2. Whenever we see a bunch of specific array subscripts on a pointer, we can suspect that this is actually a pointer to a structure, not an array of 8-byte values.

We know the structure is 0x30 bytes, and every member to be 8 bytes (there is no wacky math on puVar2, and it’s never cast to any other type). For now, we don’t know the names of the members. There’s a cool Ghidra feature that will auto-generate this structure for you (normally, i’ve just done it in the ‘Data Type Manager’ window).

Cool, lets try to figure out what each function does now. The first function, once you fix the argument type, looks like this:

We see that it prompts for a key size, we calloc that size, then we set the value of param_1->field_0x0 to the address of the new chunk we just allocated. Next, this function calls FUN_001011e9. I am omitting this for brevity, but it just reads one character at a time from the user into arg1, until either \n is reached or we’ve written len (arg2) characters. It returns the number of chars read. (Note, it does not null-terminate the string, but it won’t matter because this value isn’t treated anywhere as a string). I’m naming this function read_until_newline.

Going back out to do_store, we can label that first call as a call to get_key_from_user.

Notice that the argument to get_key_from_user is just puVar2 – this is a pointer to the structure. So we know that get_key_from_user is going to set puVar2->field_0x0 to the pointer it gets from its calloc call… that was where the key was stored! So we can rename field_0x0 to key_buf.

The return value of get_key_from_user is the return value of read_until_newline (the number of characters read). So field_0x10 is the key_len.

The next function that is called is FUN_00101364. We’ll notice a very similar pattern here, except we are reading in the value instead of the key. So field_0x8 is the value_buf. And field_0x18 is the value_len. We’ve filled in over half the struct!

So far, we haven’t seen much interesting. Just labelling struct members. That’s about to change with FUNC_001013db. Here’s how it gets called in do_store:

We need a important observation about how the return value of the mysterious FUNC_001013db gets used: It gets stored in field_0x20 and then they mask it with 0xfff, and use it as an index into a global DAT_00104040 (an array of pointers!). They pull the old pointer from &DAT_00104040 + uVar1 * 8. Then they store the new pointer in that same spot, and set field_0x28 to the old pointer. This is insertion into the beginning of a singly-linked list. FUNC_001013db is the hash function, whose lower 12 bits are used as an index to select which “bucket” the new node should be added to. Nodes are always inserted at the beginning of the list.

Here’s the hash function:

Those are some cool constants: 2021 and 0x13377331. So we know the first param is the key buffer and the second param is the key length. It iterates over every character in param_1 by incrementing local_14 as an index. For every character it computes:

val = (ulong)current_character + val * 0x13377331

with val = 0x7e5 (2021) as the starting condition, and it returns val. I played around with this equation a bit, it turns out its a linear diophantine equation and I tried to get SymPy to solve it but I wasn’t thrilled with those results.

Luckily, a brute force search over the space of all two-byte keys is very effective at finding collisions (keys whose hash value & 0xfff yields the same thing)

START = (0x7e5)
def brute_force_collision(desired_bucket):
    options = []
    for x in range(256):
        for y in range(256):
            cur_num = START
            cur_num = ((x + cur_num * 0x13377331))
            cur_num = ((y + cur_num * 0x13377331))
            if (cur_num & 0xfff) == desired_bucket:
                options.append(bytes([x, y]))
    return options

This will return a 2-tuple for every 2-byte key that hashes to the same hash table bucket.

So for bucket 0 (whose head pointer is at DAT_00104040), we can use any of these byte strings:

>>> brute_force_collision(0)
[b'\x02\xd9', b'\x07\xe4', b'\x0c\xef', b'\x11\xfa']

So far, there’s nothing wrong with the fact we might have a hash collision. Some keys will hash to the same thing (& 0xfff) so those nodes will be inserted into the same bucket. If we check the do_query function, we’ll see it actually does a memcmp (in the function at 0x1014fb) on the two keys to make sure it’s found the right node before printing it. There’s nothing wrong with this aspect of the implementation of chained hashing.

The bug

Let’s take a look at do_delete. Remember that do_store effectively did a linked list insert at the beginning of the list, so we expect do_delete to handle removing from a linked list.

I’ve already labelled one key function without telling you: get_node_by_key at 0x1014fb. This function (actually only takes two parameters) does a linked list traversal until it finds the node matching a particular key, and returns a pointer to the node if it finds it. I’ll leave that one for you to look at on your own. No issue there.

Let’s try to label a variable or two:

local_18, returned by get_node_by_key… from REing get_node_by_key we know it returns a pointer to a node in the list, depending on the key the user typed. Going to name node_to_delete.

Let’s look closely at what happens when get_node_by_key returns non-null (aka it found the node to delete).

It gets a pointer to the list head from the global (line 20), then it checks:

If the node we are trying to delete is the list head
OR the node we are trying to delete is NOT the list tail (->next is not null)

If either of these conditions is satisifed, it actually unlinks correctly: The loop finds the pointer to change (the previous node’s next pointer, or the pointer to the head of the list) and then changes it to new next node.

But what happens when neither of these conditions are satisfied? That is, the node we are deleting is not the first node, but is the last node. We skip the unlinking entirely, but still free the key buffer, value buffer, and the node itself.

This means that if we ‘delete’ the last node in a list of at least two elements, the previous node still references the node we just free’d.

This one bug leads to essentially two vulnerabilities:

Use-after-free: The node is still in the linked list. We can still use the ‘query’ function to print out its value.
Double free: We can still use the ‘delete’ function to free the same node (and it’s key and value buffers) again, since the node is still in the linked list.

Part 4: Experimenting with the musl libc heap

So, I’ve done a little glibc heap exploitation before, so I had a few broad assumptions about how the heap would work here:

If I make some allocations of size X (X < 4096-ish), and then I free one of them, and then I make some allocations of size X again, I expect to eventually get the chunk I free’d back. (This holds mostly true.)

Since I don’t know anything about the musl implementation of the heap, and I want to verify these assumptions, I created a little gdb script to log every allocation and free. We can pass this to gdb.attach in pwntools, or run gdb with -x.

# Output to gdb.txt
set logging on

# Define a function for handling calloc breakpoints
# (yes, it has to be a separate function)
define foo
printf "calloc"
print $rsi
finish
print $rax
c
end

b calloc
commands
foo
end

# Define a function for handling free breakpoints
define bar
printf "free"
print $rdi
c
end

b free
commands
bar
end

(You could also use LD_PRELOAD as mhackeroni did: https://github.com/Ikiga1/writeups/tree/master/DefConQuals2021/mooosl to get the same diagnostics without the struggles of gdb)

I wrote simple functions to do store, delete, and query. (See solve.py). store allows for a controlled key and value size. delete and query allow for controlled key size.

def store(key, value, keysize=None, valuesize=None):
    ...
def delete(key, keysize=None, error=False):
    ...
def query(key, keysize=None)
    ....

Let’s try a simple experiment to see what we can do with the use-after-free. I will be using the colliding keys found in Part 3 above.

store(b'\x02\xd9', b'yeet', keysize=0x8, valuesize=0x20)
store(b'\x07\xe4', b'yeeticus', keysize=0x8, valuesize=0x20)
delete(b'\x02\xd9') This will delete the last node in the list, but it will not be unlinked.
store(b'\x0c\xef', b'yeeticus maximus', keysize=0x8, valuesize=0x20) Store always inserts at the beginning of the list.
Repeat 4 two more times, to see when we get the deleted node back

Here’s the sequence of calloc and free that we see:

# Store 1: allocate node (always 0x30)
calloc$1 = 0x30
$2 = 0x7f5071cbdc60

# Store 1: allocate key (controlled size)
calloc$3 = 0x8
$4 = 0x559ad37fcc60

# Store 1: allocate value (controlled size)
calloc$5 = 0x20
$6 = 0x7f5071cbd870

# Store 2: allocate node (always 0x30)
calloc$7 = 0x30
$8 = 0x7f5071cbdca0

# Store 2: allocate key (controlled size)
calloc$9 = 0x8
$10 = 0x559ad37fcc70

# Store 2: allocate value (controlled size)
calloc$11 = 0x20
$12 = 0x7f5071cbd8a0

# Delete: allocate key (controlled size)
calloc$13 = 0x3
$14 = 0x559ad37fcc80

# Free key for 1st store (we know via order of free's we see in Ghidra)
free$15 = 0x559ad37fcc60

# Free value for 1st store
free$16 = 0x7f5071cbd870

# Free node for 1st store
free$17 = 0x7f5071cbdc60

# Remove key from above delete
free$18 = 0x559ad37fcc80

# Store 3
calloc$19 = 0x30
$20 = 0x7f5071cbdce0

calloc$21 = 0x8
$22 = 0x559ad37fcc90

calloc$23 = 0x20
$24 = 0x7f5071cbd8d0

# Store 4
calloc$25 = 0x30
$26 = 0x7f5071cbdd20

calloc$27 = 0x8
$28 = 0x559ad37fcca0

calloc$29 = 0x20
$30 = 0x7f5071cbd900

# Store 5
calloc$31 = 0x30
$32 = 0x7f5071cbdd60

calloc$33 = 0x8
$34 = 0x559ad37fccb0

calloc$35 = 0x20
$36 = 0x7f5071cbd930

A couple of things to notice here:

calloc is returning a mix of heap pointers and libc pointers. Checking /proc//maps, we find the two segments that the chunks are coming from:

  ...
  559ad52d0000-559ad52d1000 ---p 00000000 00:00 0                          [heap]
  559ad52d1000-559ad52d2000 rw-p 00000000 00:00 0                          [heap]
  7f5071c06000-7f5071c1b000 r--p 00000000 08:05 1837434                    /usr/lib/x86_64-linux-musl/libc.so
  7f5071c1b000-7f5071c82000 r-xp 00015000 08:05 1837434                    /usr/lib/x86_64-linux-musl/libc.so
  7f5071c82000-7f5071cb9000 r--p 0007c000 08:05 1837434                    /usr/lib/x86_64-linux-musl/libc.so
  7f5071cb9000-7f5071cba000 r--p 000b2000 08:05 1837434                    /usr/lib/x86_64-linux-musl/libc.so
  7f5071cba000-7f5071cbb000 rw-p 000b3000 08:05 1837434                    /usr/lib/x86_64-linux-musl/libc.so
  7f5071cbb000-7f5071cbe000 rw-p 00000000 00:00 0 
  7ffc65c3f000-7ffc65c60000 rw-p 00000000 00:00 0                          [stack]
  ...

So, some of the pointers returned by malloc are in rw pages loaded immediately after libc (but not file-mapped), so we can assume this is the .bss segment. (You can confirm this when you load libc.so into Ghidra, which we might want later). The other chunks are in the [heap] segment as you’d expect.

We never got any of our free’d pointers back! We need to ~~be more creative~~ try different patterns of allocations if we are going to try to re-allocate the free-d chunk. (or maybe we just need to make more allocations before we get the same chunk back)

At this point, it seems like it’d be good to develop a better understanding of the musl-libc heap.

how does the musl-libc heap work

I found the current implementation here: https://github.com/ifduyue/musl/tree/master/src/malloc/mallocng

I didn’t need to fully understand how this allocator works in order to solve this challenge, but I needed to understand how the chunk metadata works and where it is stored. The code is somewhat dense and not well commented, but we can infer a lot from one function in particular: get_meta:

This function gets called very early in free(), and is passed the argument to free (the start of user data).

Going line-by-line:

Line 131: The user data must be 16-byte aligned
Line 132: offset is a unsigned two-byte value stored immediately before the user data
Line 133: get_slot_index is:
```
  p[-3] & 31;
```
so we know index is a unsigned 1-byte value (since p is ptr to unsigned char). It is stored three bytes before the user data and is being implicitly cast to int. & 31 means the only the lower 5 bits are used.
Line 134: There is a byte four-bytes before the user data that causes some additional checks if non-zero. We’ll assume these won’t matter for now.
Line 139: IMPORTANT: Chunks are stored contiguously in ‘groups’, which share a struct group at the ‘base’ of the group. The ‘offset’ (stored immediately before the user data) specifies how far (in UNIT-sized units, UNIT is defined as 16) away the base of the group is.

Line 140: The struct group has a pointer to a struct meta. Let’s check out the struct definitions real quick:

  struct group {
      struct meta *meta;
      unsigned char active_idx:5;
      char pad[UNIT - sizeof(struct meta *) - 1];
      unsigned char storage[];
  };

  struct meta {
      struct meta *prev, *next;
      struct group *mem;
      volatile int avail_mask, freed_mask;
      uintptr_t last_idx:5;
      uintptr_t freeable:1;
      uintptr_t sizeclass:6;
      uintptr_t maplen:8*sizeof(uintptr_t)-12;
  };

So groups seemed to be further organized into ‘metas’ which serve as a linked-list of groups. Each ‘meta’ has a few intresting attributes we’ll see soon.

Line 141: The struct group points to a struct meta through ->meta, and the meta points back to the group via ->mem. These pointers must match up.
Line 142: The chunk’s index must not exceed the meta’s last_index.
Line 143 - 144: avail_mask and free_mask are ints which are used as ‘bitmaps’ where bit i represents whether the chunk with index i is available, freed, or both. For a chunk to be freeable, it should be neither available nor freeed.
Line 145: Every page (aligned 4096 byte chunk) that contains a struct meta must have a struct meta_area at the beginning of the page.
```
  struct meta_area {
      uint64_t check;
      struct meta_area *next;
      int nslots;
      struct meta slots[];
  };
```
Line 146: ctx is a global (you can just grep for it to find this out), with a secret member which much apparently match the check of every meta_area.
Line 147-152: The size class should be either 0-47 or 63. If we look elsewhere we see that mmap-ed chunks (for allocations of size > MMAP_THRESHOLD) are assigned size 63. For size class 0-47, it uses the size class as an index into some global array of sizes. It verifies that the chunk is where it’s supposed to be: offset (which was in units of size UNIT) should be between the size class * index and size class * (index + 1). So apparently the size classes are in UNIT units.
Line 153: This isn’t important, let’s try to avoid this check

From this, we can make a crappy diagram of the musl libc heap (high mem is down):

Chunk:

|
|----------- <-- 'struct group' (p - 16*off - 16)
| meta (8) - ptr to meta          
| active_idx (1)
|
|------------------
| **********************
| ** more chunks here **
| **********************
|------------------ <-- (p - 8), chunk start
| ???
| ???  (4)
| ???
| ???
|------------------
| ??? (1)
|------------------
| index (1)
|------------------
|
| offset (2)
|
|------------------ <-- p, returned from malloc
| user data
|
| ...
|
|
|----------- <-- Chnk end (p + 16*size_class - 8)
|                       
|*********************
|** more chunks here**
|*********************
|

(notice, i did not bother to reverse the members that I did not need)

Meta / Meta Area:

|------ meta area at 0xXXXXXX000
| check (8) - should match ctx.secret
| next (8)
| nslots (4)
| ...pad... (4?)
|------
|
| ... some more struct metas here
|
|
|------ meta at 0xXXXXXXYYY
| prev (8)
| next (8)
| mem (8) - ptr to group
| avail_mask (4)
| freed_mask (4)
| extras (8)
|-------

We will explore these structures again in Part 6.

A summary of differences between glibc and musl’s new mallocng implementation:

glibc	musl (mallocng)
Manages linked lists of chunks	Manages linked lists of ‘metas’, which are groups of contiguous chunks
Fwd/back stored in user data section of freed chunks	Nothing is stored in user data of free’d chunks
Chunk size is stored with chunk	Chunks appear in groups of the same size, they share a ‘meta’ pointer at the beginning of the group; the meta contains the size

Part 5: Coming up with an exploitation strategy

Arbitrary Read

First, we realize that we can use the ‘query’ functionality to print out the contents at the free’d node’s value pointer. The node, the key buffer, and the value buffer are all used after they are free’d.

 Chunk 1
|---------|                  Chunk 2
| key_buf |--------------->|---------|
|---------|     Chunk 3    |         |
| val_buf |-->|---------|  |_ _ _ _ _|
|---------|   |         |  |   ...   |
| key_len |   |_ _ _ _ _|
|---------|   |   ...   |
| val_len |
|---------|
| hash_val|
|---------|
| next    |
|---------|

We originally controlled the data in Chunk 3, but it’s free’d now. If we can make chunk 3 get allocated again as a node, then we it will be filled with all the data a node normally has. The first 8 bytes in chunk 3 will be filled with a pointer to a buffer for the new node’s key! We can use this to get both a libc leak and a heap leak, because we observed earlier that some pointers returned by calloc are in libc’s .bss, and some are on the heap as expected. The value is printed as hex byte-by-byte, for however long val_len specifies, so null bytes will not pose an issue.

To get chunk 3 allocated again as a node, we need to ask for a key length of 0x30, which the same as the allocation size for the node. This strategy only works if we can get Chunk 3 allocated as a node before Chunk 1 gets allocated again – if Chunk 1 gets allocated, it will get zeroed by calloc and we will no longer have a pointer into Chunk 3 that we can print from. Luckily, we noticed earlier that the heap is not LIFO (last-in-first-out) at all. With some trial and error we are able to achieve multiple allocations of Chunk 3 as a node while Chunk 1 stays intact.

Ok, so we can leak a heap pointer and a libc pointer, but that’s not an arbitrary read advertised by the header of this section. For reasons I have not yet described, we want to leak the ‘secret’ member of the malloc_context structure, which resides in libc’s .bss (You can find this by looking at the globals referenced by malloc/free in libc using Ghidra; the secret is at 001b4ad0 for me). To read this value, we need to re-allocate Chunk 1 as a ‘value’ and then overwrite the contents of the node so that the address we want to read (of the malloc_context secret) is in place of val_buf.

For this, it is convenient to have your chunks in bucket 0x7e5 (the bucket things fall into when key length = 0) so we can zero those members out and still be able to locate our chunk.

 Chunk 1
|------------------------------|
| key_buf=0                    |
|------------------------------|
| val_buf=0x1b4ad0 + libc_base |
|------------------------------|
| key_len=0                    |
|------------------------------|
| val_len=0x8                  |
|------------------------------|
| hash_val=0x7e5               |
|------------------------------|
| next=0                       |
|------------------------------|

Then we can just ‘query’ for our chunk (w/ key length 0) and we it will print the value from val_buf, which is the secret value we want.

Arbitrary Write?

Often with glibc heap problems, the goal is to corrupt the heap so that malloc returns a controlled address (e.g. via tcache posioning), giving you a write-what-where primative. Then, a typical target is __free_hook in glibc, which gets called with every pointer that gets free’d - you overwrite this to call system and then you just free some data you control to call system with your data.

We have a few problems:

Making malloc return a controlled pointer is hard, because chunks are not linked together - rather, struct metas (metadata for a group of contiguous, uniform chunks) are linked together. Getting malloc to return a controlled pointer in musl-libc would require constructing a fake struct group (pointing to a valid struct meta) and a fake chunk somewhere near where you want to write, and then somehow freeing a valid chunk in that group so that malloc would return chunks from that group in the future. I worked toward this approach, and there might be a way to make it work, but I couldn’t.
There is no __free_hook in musl. How can we use an arbitrary write to get code execution?

Regarding turning arbitrary write into code execution, I found the answer in (the translation of) someone’s personal chinese blog. The technique is called FSOP (File-stream oriented programming), and I had never heard of it before. There is a global FILE structure that has some function pointers, so if we get arbitrary write then we can change one of these pointers. The pointers get called upon exit() with a pointer to the file structure. So we need to write /bin/sh at the beginning of the structure, and overwrite the write pointer to call system.

However, I struggled to figure out how to turn the double-free and UAF into an arbitrary write. I had a libc and heap leak, and I knew I could turn an arbitrary write into code execution, but I didn’t figure it out in time. One thing I had thought to try was that maybe mmap-ed chunks were treated differently, and that maybe I could construct a fake mmap-ed chunk – this didn’t work, since I didn’t have anywhere I could construct the fake chunk that was close enough to libc’s .data (where I want to write). After the competition was over some people were posting solve scripts, and I took a peek at a few. I noticed they were constructing a fake struct meta and were using the fwd and bck pointers for their arbitrary write. I then understood what I needed to do.

Semi-arbitrary write

Normally when you want to remove a node from a linked list, you’ll do the following:

cur->next->prev = cur->prev;
cur->prev->next = cur->next;

next chain: A -> B -> C becomes A -> C prev chain: A <- B <- C becomes A <- C

If we can control cur->next and cur->prev, we can gain an semi-arbitrary write (both cur->next and cur->prev must point somewhere writable). This is a ‘unlink attack’ and glibc has, over the years, gained protections against this by checking that cur->next->prev == cur and cur->prev->next == cur.

Musl libc has some code in the dequeue function to unlink a struct meta… and it doesn’t have integrity checks! (… maybe someone should suggest this to them?). The dequeue function gets called in free, if that free caused a struct meta to become entirely free.

How do we trigger this? We can construct a fake chunk, complete with a fake struct group pointing to a fake struct meta, with a fake struct meta_area at the beginning of the page which containeed the struct meta. To summarize the conditions for our fake structures:

The fake chunk needs to have a fake struct group at (p - 16*off - 16).
The fake struct group needs to point to a fake struct meta
The fake struct meta needs to have mem pointing back to the fake struct group. Recall the struct definition:
```
  struct meta {
      struct meta *prev, *next;
      struct group *mem;
      volatile int avail_mask, freed_mask;
      uintptr_t last_idx:5;
      uintptr_t freeable:1;
      uintptr_t sizeclass:6;
      uintptr_t maplen:8*sizeof(uintptr_t)-12;
  };
```
We will put the address we want to write to in next and the value we want to write there as prev. It will also change prev->next (bytes at offset 8-15 of prev) to next, which will be discussed later.

I will explain the values I used for the rest of the fields in the annotated solve script.
The beginning of the page containing the struct meta needs to have a struct meta_area
```
  struct meta_area {
      uint64_t check;
      struct meta_area *next;
      int nslots;
      struct meta slots[];
  };
```
It turns out that all that gets checked here is the check value, and we were able to leak the required value using the arbitrary read above.

After we’ve created our heap structures, we will just free our fake chunk; this will trigger a dequeue which will do the unlinking of our fake struct metas. We can free arbitrary addresses using the same technique we used to get our arbitrary read.

Part 6: Writing the exploit

To summarize the plan:

Leak libc and heap pointers by getting a “value” allocation re-used as a “node”.
Leak the heap secret from libc by getting a “node” allocation re-allocated as a “value”, which lets us overwrite the entire node, including key and value pointers in the node w/ arbitrary addresses.
Construct multiple fake heap structures, and then free our fake heap chunk. We can free arbitrary pointers using the same technique: overwrite the value pointer in a node to the address we want, then free the node (again).
We should set up the fake structures so that we… *wait a minute, we never figured out what we would do with our arbitrary write…*

Target

I’ve left out the final part of the plan. I wrote all the code for the above steps, and then realized the constraint that I mentioned above: both the “what” and the “where” must be writable locations (because we unlink both forward and backward lists). We know we want to overwrite the write pointer in the stdout global structure, and write /bin/sh to the start of the structure. But system isn’t a writable location (code is read-only) and neither is b'/bin/sh. What now? I messed around with this for a day, then gave up and looked at how another team did it. I thought that maybe I could overwrite the stdout global to an fake struct FILE, but this global is in a read-only section. I learned my lesson: DON’T IGNORE NON-FUNCTION XREFS.

There is ANOTHER reference to the stdout FILE structure. And wha-do-ya-know, it’s in the writable .data. I have no idea why it’s in a writable segment, given that the structure doesn’t move, but OK.

So we want to set up our arbitrary write so that we overwrite this pointer to a file structure with a pointer to a fake file structure with the read function pointer pointing to system instead, and the first 8 bytes of the structure containing /bin/sh. This will work with our arbitrary write, because both the location we want to write to and the value we are going to write there can both be writable (rather than trying to write the address of system somewhere).

I didn’t develop this whole exploit at once, but I thought it made more sense to explain the whole thing, and leave the annotated code to explain each part.

So at this point, check out solve.py. There are a few additional implementation details there but otherwise everything should have been explained in detail in this write-up