Smasher2 was the follow up to one of the best, hardest boxes. Things are about to get difficult. :)
The start of this box required very basic enumeration:
nmap
to discover port 53
and 80
dirb
/rustbuster
/gobuster
on port 80 to discover /backup
dig AXFR smasher2.htb @smasher2.htb
to discover wonderfulsessionmanager.smasher2.htb
/backup
was protected by Basic HTTP authentication, and there weren’t any hints or leads on a username; however, basic fuzzing of wonderfulsessionmanager.smasher2.htb
didn’t bring up any leads either. So, I kicked off hydra
against /backup
with a username of admin
, a password list of rockyou.txt
and waited*. A while. After an hour or two, the password: clarabibi
came back.
*Note: I wouldn’t recommended bruteforcing HTTP authentication with such a huge password list, especially without a known username. This part was weird, and the box creator had to lead people to this point.
Once authenticated, I had access to two files: auth.py
and ses.so
. After looking at the files, it appeared that this is what was powering wonderfulsessionmanager.smasher2.htb
. Unfortunately, auth.py
had its credentials removed, so some reversing was in order.
auth.py
was a very basic Flask application that had two endpoints. The first was a simple authentication API which returned an API token on a successful auth. The other allowed the executution of bash
jobs with a valid API token. There were no obvious bugs or exploitable issues with the Flask app itself, so I investigated the ses.so
file. This was a CPython extension which was imported by auth.py
. It provided the actual authentication for auth.py
.
I decided to start with some basic static analysis. This has because much easier for those of us without tens of thousands of dollars to spend on IDA Pro with the release of GHIDRA. GHIDRA is a free reverse engineering tool from the NSA with a pretty powerful disassembler / decompiler.
After loading the file in GHIDRA, I started by finding the most interesting looking items in the Functions
window. In this case, those were the two functions called from the Flask app: SessionManager__init
, and SessionManager_check_login
. SessionManager__init
turned out to be innocuous, so my focus fell upon SessionManager_check_login
. To get a more thourough understanding of the function, I renamed functions and variables as I discovered their use. Here’s what GHIDRA looked after my first pass:
Doing this made spotting potential issues much easier. There were a few methods called within SessionManager_check_login
that I explored: set_blocked
, get_login_count
, set_unblocked
, set_login_count
, can_login
, set_last_login
, get_internal_pwd
, and get_internal_user
. These turned out to be basic getter or setter methods, but get_internal_pwd
had a small, but powerful bug:
Can you spot the difference? No? That’s the point – get_internal_pwd
was actually returning the username. This made bruteforcing the account possible, as one only had to guess the correct username. This solution was used by the initial solvers of the box; however, this was an unintended bug, and I wanted to find the real solution.
There wasn’t anything obvious sticking out, so it was time to fire up some dynamic analysis and fuzzing. I used auth.py
and ses.so
to run the application locally (I just needed to create a templates
directory with index.html
and login.html
files to keep Flask happy).
The first idea I had was type confusion. Although this is a common route to exploit interpretered language interop, ses.so
had pretty robust type checking. Passing {"data": []}
, {}
, {"data":{}}
, etc. all resulted in valid, safe exits. After poking around a little more, I found that {"data": {"username": [], "password": "X"}}
caused a crash!
Now that I was onto something, I attached gdb
to see what was happening: gdb python $(ps aux | grep auth.py | grep -v grep | awk '{print $2}')
. Once attached, I set breakpoints in the areas I was investigating, e.g. break *SessionManager_check_login+1110
.
Unfortunately, after investigating more closely, I found that this was a dead end.
One can see the list that was passed as username
was getting passed into PyString_AsString
. The documentation for this method made it pretty clear what was happening: since username_raw
was not a PyString
, PyString_AsString
was returning NULL
, and strcmp
failed since the first parameter, username
, was NULL
. Back to the drawing board.
I didn’t want to completely give up on type confusion, so my next step was to look for potential code paths where the data
dictionary’s type wasn’t checked, or was still used after it was checked. There were two scenarios where that happened:
The only problem was that data
was not actually used in any of those code paths, other than being returned so it could be presented in the error message. But in investigating those two areas of the code, something very interesting popped out:
Notice that the line *data = *data + 1;
is missing in the second code path. Further investigating showed that every other non-exception code path also had that line. So, what was it for? Time to dive into Python internals.
Python uses reference counting in addition to garbage collection. Reference counting is a way for the Python runtime to determine when it’s safe to deallocate objects. Once an object’s reference count hits 0, the object’s type deallocation function is invoked. These counts are increased by the Py_INCREF
macro, and decreased by the Py_DECREF
macro.
In this case, *data = *data + 1;
was Py_INCREF
which increased the number of references to the data
dictionary. The developer did this because data
was put into the return_list
variable, and so shouldn’t be deallocated. However, in the other code path there was no Py_INCREF
! And afterwards, the code hit this:
It was Py_DECREF
. It decreased the number of references (since this function was returning and no longer needed to reference that variable), and if the number of references was 0
, it would call (**(code **)(data[1] + 0x30))(data);
We know data
was a dictionary, which is PyDict_Type
in CPython. By looking at the source, I could see that 0x30
into the type structure was the deallocator, dict_dealloc
. And that’s exactly what would happen if the execution went down this code path. Since there was no Py_INCREF
when Py_DECREF
would be called, the reference count would hit 0
, and Python would deallocate that object.
Using this knowledge, I built a quick POC:
and I successfully crashed the application:
I now had some sort of heap-based SIGSEV. The dictionary that Python expected to be there is now free
‘d / deallocated, and cause a crash. In order to exploit this, I had to dive deeper.
A couple important facts on Python:
1. Everything on Python is managed on a private heap, managed internally by Python.
2. Python’s memory manager has different segments for each type of object. So integers, lists, dictionaries, etc. are all managed differently, but together, within the private heap.
3. These separate areas are managed like fast bins: last in, first out.
4. This can be seen by reviewing the source code for various – if there is space on the free_list
, it adds the pointer to the deallocated object to the free_list
for that type of object.
Since this code path did not check the “type” of "data"
passed in the JSON payload, and it dealloc’d that “type” of object from memory (meaning that pointer would now be pointing to a different object of the same “type” of "data"
), AND ses.so
returned that point to auth.py
which printed it to the user, I knew I was on to something. By passing in various types of "data"
, I should be able read the last item of that “type” created in Python’s heap.
After a bit of trial and error with different object types, I was successful with: {"data": []}
:
resulted in
The last list
loaded onto Python’s heap before the list
I sent was the one used to initiate the SessionManager
, which contained the API key!
I verified this in gdb
by stepping through the deallocation code. I set a breakpoint on the deallocation call: break* SessionManager_check_login+1281
, and continued the program. In another terminal, I kicked off the exploit: python exploit.py
, and stepped into the list_dealloc
function in gdb
.
gdb
makes it really easy to debug Python objects. One can cast objects to PyObject*
to view their data and members, print local variable names like op
, and more.
list_dealloc had a local variable op
, which is the PyListObject*
that was being deallocated:
PyListObject
inherits from PyObject
, which is found implemented here. PyObject
has more members “above” ob_refcnt
, via the define _PyObject_HEAD_EXTRA
. This define adds pointers to support a doubly-linked list of all the live heap objects for that “type” of PyObject
.
I took a look at the start of list_dealloc
:
and found the logic to retreive the forward and backwards looking pointers for the doubly-linked list which could be verified by inspecting the data:
It then updated the next
and prev
pointers for the items on that list, effectively “cutting out” the object that was being free’d, and zeroed out the next
pointer (if anyone can enlighten me on what the 0xfffffffffffffffe
checks and mov
s are, I’d be fascinated to know):
Next, it verified that the list was empty and continued to the free_list
check (if the list wasn’t empty, it would have ran through the list and deallocated each item):
Since numfree
was less than PyList_MAXFREELIST
, Python added the pointer op
to the freelist
: free_list[numfree++] = op;
.
As to the specifics as to why the list that’s passed to SessionManager_init
needs to be re-allocated (and thus re-uses the pointer that we just freed), I’m not 100% sure. If someone is more familiar with Python internals, I would love to know. But, it can clearly be seen if one set watchpoints on that memory in gdb
:
Step through until the end of PyList_New
completes allocation, plus assignment.
With the API token, it was trivial to run a bash job and get an initial shell, and user.txt
.
The final part of Smasher2 was a fun kernel module exploit, but this post is long enough, so check out Snowscan’s write up for the rest!