Cherokee - network fuzzing with AFL

2021-10-16

tl;dr: Cherokee webserver fuzzed with AFLplusplus. One not impressive crash discovered. Lessons learned from network fuzzing, optimization, custom mutators and coverage checking with AFL.

Network fuzzing

Cherokee is a web server that I previously analyzed here using more basic approach. This time I wanted to utilize AFLplusplus and preexisting solutions to try and learn new things.

Initial setup

Cloning the official repository and building.

git clone https://github.com/cherokee/webserver webserver

./autogen.sh --prefix=$(pwd)/usr --sysconfdir=$(pwd)/etc --localstatedir=$(pwd)/var
CC=afl-cc make install

Configuration

As in the previous attempt, the web server has been configured to utilize all handlers available to cover as much code as possible.

Payload delivery

AFL can deliver payloads to its target using either stdin or file read from local drive which is not enough for fuzzing network application like web servers. To solve this problem there are following possibillities:

Modify the application’s code to read data from stdin or file instead of network socket
Modify the application’s code to implement a client that will connect to the server and deliver payloads (read from stdin or file), described here
Override network functions (socket, listen, accept etc.) and synchronize them with stdin. This kind of solution is implemented by preeny’s desock and doesn’t require modifying the source code.

I decided to use to use Preeny as it seemed to be the fastet approach and possible to re-use in other applications.

$ git clone https://github.com/zardus/preeny/
$ cd preeny
$ make

Test cases

For testcases I decided to use manually crafted HTTP requests to cover following scenarios: simple GET request, POST request with body and GET request with as many HTTP headers as possible . Each of them was replicated to point to every existing handler. In total it gave 54 requests.

According to the AFL’s documentation it is recommended to keep number of test cases as small as possible (the same applies to their size). In the repository there’s a tool designed to minimize the corpus of test cases. In this specific case it reduced number of requests to 11.

$ AFL_PRELOAD=/root/preeny/x86_64-linux-gnu/desock.so afl-cmin -i testcases -o testcases-min webserver/usr/sbin/cherokee-worker -C cherokee.conf 
corpus minimization tool for afl++ (awk version)

[*] Testing the target binary...
[+] OK, 954 tuples recorded.
[*] Obtaining traces for 54 input files in 'testcases'.
    Processing 54 files (forkserver mode)...
[*] Processing traces for input files in 'testcases'.
    Processing file 54/54
    Processing tuple 1310/1310 with count 54...
[+] Found 1310 unique tuples across 54 files.
[+] Narrowed down to 11 files, saved in 'testcases-min'.

System configuration

To start fuzzing it is required to configure pattern for core dump files which are used to detect crashes (or at least it’s recommended to configure because still it can be ommited with one of the supported env variables)

However, AFL provides a tool that configures the system and gives more configuration advices.

# afl-system-config 
This reconfigures the system to have a better fuzzing performance.
WARNING: this reduces the security of the system!

Settings applied.

It is recommended to boot the kernel with lots of security off - if you are running a machine that is in a secured network - so set this:
  /etc/default/grub:GRUB_CMDLINE_LINUX_DEFAULT="ibpb=off ibrs=off kpti=0 l1tf=off mds=off mitigations=off no_stf_barrier noibpb noibrs nopcid nopti nospec_store_bypass_disable nospectre_v1 nospectre_v2 pcid=off pti=off spec_store_bypass_disable=off spectre_v2=off stf_barrier=off srbds=off noexec=off noexec32=off tsx=on tsx_async_abort=off arm64.nopauth audit=0 hardened_usercopy=off ssbd=force-off"

If you run fuzzing instances in docker, run them with "--security-opt seccomp=unconfined" for more speed

Start

Having previous steps finished everything was ready to start fuzzing.

$ AFL_PRELOAD=/root/preeny/x86_64-linux-gnu/desock.so afl-fuzz -i testcases-min -o output webserver/usr/sbin/cherokee-worker -C cherokee.conf

Speed that I achieved was about 150 executions per second. I knew that there’s a lot of space for improvements so I decided to implement them step by step and compare results. Tips and recommendations related to performance are defined in the documentation:

Optimization

static binary

The first thing to do was to re-build the project as statically linked and disable unnecessary features (pthread and ssl support).

$ ./configure --prefix=$(pwd)/usr --sysconfdir=$(pwd)/etc --localstatedir=$(pwd)/var --disable-pthread --enable-static-module=all --enable-static --enable-shared=no --with-libssl=no
CC=afl-cc make install

This change itself gave boost to 250 executions per second.

Note: I executed fuzzing on a virtual machine, so generally numbers are low. However, in this case I cared more about relative change rather than absolute numbers. On more powerful machines I expect numbers to be higher and differences more visible.

Persistent mode

AFL offers persistent mode which a few features may significantly increase performance. To enable the mode some changes in the target’s source code are required.

https://github.com/AFLplusplus/AFLplusplus/blob/stable/instrumentation/README.persistent_mode.md

Deferred initialization

Starting with deferred initialization, I had to spot the best place for AFL to clone the process. In the documentation there are rules which should be followed to not break the application. From the performance perspective cloning should happen as late as possible but before the application starts reading input and initializes resources directly related to processing input (eg. network sockets).

Just for the beginning, I tried to use a place right before server initialiation process happens.

cherokee/main_worker.c

   __AFL_INIT();

   ret = common_server_initialization (srv);
   if (ret < ret_ok) {
       exit (EXIT_ERROR_FATAL);
   }


After rebuilding and starting the fuzzer I noticed boost to 270 executions per second.

Deferred initialization

At this point it was noticable that deferred initialiation helps, so I wanted to improve this part. Going through the implementation of common_server_initialization I spotted an interesting place. Most of the work done by the initialization method (changing user and working directory, initializing logger, building strings with server name) could happen before cloning. The only two actions that were really necessary to happen after cloning were: port binding and thread initialization (but in fact the application was working on a single thread). So I split this function and put AFL’s macro between them.

Thanks to this modification I reached about 700 executions per second.

diff --git a/cherokee/main_worker.c b/cherokee/main_worker.c
index 8af97978..caffaad1 100644
--- a/cherokee/main_worker.c
+++ b/cherokee/main_worker.c
@@ -255,7 +255,13 @@ common_server_initialization (cherokee_server_t *srv)
        ret = cherokee_server_initialize (srv);
        if (ret != ret_ok) return ret_error;
 
-       cherokee_server_unlock_threads (srv);
+
+       __AFL_INIT(); 
+       
+       ret = cherokee_server_initialize_2 (srv);
+       if (ret != ret_ok) return ret_error;
+
+       // cherokee_server_unlock_threads (srv);
        return ret_ok;
 }

diff --git a/cherokee/server.c b/cherokee/server.c
index 64faccdd..24aa2508 100644
--- a/cherokee/server.c
+++ b/cherokee/server.c
@@ -951,17 +951,6 @@ cherokee_server_initialize (cherokee_server_t *srv)
                return ret_error;
        }
 
-       /* Initialize the incoming sockets
-        */
-       list_for_each (i, &srv->listeners) {
-               ret = cherokee_bind_init_port (BIND(i),
-                                              srv->listen_queue,
-                                              srv->ipv6,
-                                              srv->server_token);
-               if (ret != ret_ok)
-                       return ret;
-       }
-
        /* Verify the thread number and force it within sane limits.
         * See also subsequent fdlimit_per_threads.
         */
@@ -1048,18 +1037,32 @@ cherokee_server_initialize (cherokee_server_t *srv)
        if (ret != ret_ok)
                return ret_error;
 
+       return ret_ok;
+}
+
+ret_t
+cherokee_server_initialize_2 (cherokee_server_t *srv)
+{
+       ret_t            ret;
+       cherokee_list_t *i, *tmp;
+
+       /* Initialize the incoming sockets
+        */
+       list_for_each (i, &srv->listeners) {
+               ret = cherokee_bind_init_port (BIND(i),
+                                              srv->listen_queue,
+                                              srv->ipv6,
+                                              srv->server_token);
+               if (ret != ret_ok)
+                       return ret;
+       }
+
        /* Create the threads
         */
        ret = initialize_server_threads (srv);
        if (unlikely(ret < ret_ok))
                return ret;
 
-       /* Print the server banner
-        */
-       ret = print_banner (srv);
-       if (ret != ret_ok)
-               return ret;
-
        return ret_ok;
 }

AFL Loop and shared memory

There are two more features in the persistent mode, however in that case I was not able to use them. AFL loop allows to repeat execution of the specific fragment in the code. Usage and potential problems are similar to deferred initialization. The problem I was facing here was related to preeny. Preeny’s desock library exits the program when a socket is closed, so the application does not even reach the end of the loop.

For the shared memory approach problem was also related to Preeny. As handling stdin is done in the library itself it was not possible to utilize shared memory without messing with the library’s code.

I expect that it would be possible to modify Preeny to work with those two features but I didn’t want to spend to much time on it. However, it may be worth a try in case of other targets.

tmpfs

The last thing to try was usage of tmpfs. The reasoning behind this is that AFL reads and creates a lot of files, so avoiding hitting the disk should positively influence the performance. The only problem with this approach is that power loss may destroy our findings. So an in-between solution is to use AFL_TMPDIR env variable and point to tmpfs directory. However, at this moment I didn’t care about power loss, so I was fine with the more risky approach.

# swapoff -a
# mount -t tmpfs -o size=500M tmpfs /root/fuzz

I moved all necessary files (compiled application, testcases, output directory) to created /root/fuzz directory.

Finally, I achieved about 800 executions per second.

Mutators and coverage

AFL offers API to write custom mutators. They can be used to generate different input cases and increase coverage. AFL is provided with a few custom mutators which I decided to use.

radamsa

Radamsa originally was developed as a test case generator. In the AFL’s repository Radamsa’s code was adapted and turned into a custom mutator.

Usage of this custom mutator does not required any additional changes.

$ cd AFLplusplus/custom_mutators/radamsa
$ make

$ AFL_CUSTOM_MUTATOR_ONLY=1 AFL_CUSTOM_MUTATOR_LIBRARY=/root/AFLplusplus/custom_mutators/radamsa/radamsa-mutator.so AFL_PRELOAD=/root/preeny/x86_64-linux-gnu/desock.so afl-fuzz -i testcases-min -o output webserver/usr/sbin/cherokee-worker -C cherokee.conf

honggfuzz

This custom mutator is based on the Honggfuzz fuzzer. It can be used in the same way as the previous one.

$ cd AFLplusplus/custom_mutators/hongfuzz
$ make

$ AFL_CUSTOM_MUTATOR_ONLY=1 AFL_CUSTOM_MUTATOR_LIBRARY=/root/AFLplusplus/custom_mutators/honggfuzz/honggfuzz-mutator.so AFL_PRELOAD=/root/preeny/x86_64-linux-gnu/desock.so afl-fuzz -i testcases-min -o output webserver/usr/sbin/cherokee-worker -C cherokee.conf

Grammar mutator

This one is a little bit more complicated to use. This one is designed to fuzz structured inputs. To define a structure a special definition file must be created. Hopefully, there was an example file for HTTP protocol, so I just adapted it to my needs (mostly to hit all defined handlers).

More detailed description how to use this mutator can be found here.

$ cd AFLplusplus/custom_mutators/grammar_mutator
$ ./build_grammar_mutator.sh
$ cd grammar_mutator

--- grammars/http.json	2021-09-30 11:37:09.463484532 -0400
+++ grammars/myhttp.json	2021-09-30 11:42:04.605828187 -0400
@@ -3,7 +3,7 @@
 	
 	"<START_LINE>": [["<METHOD>", " ", "<URI>", " ", "<VERSION>"]],
 	
-	"<METHOD>": [["GET"], ["HEAD"], ["POST"], ["PUT"], ["DELETE"], ["CONNECT"], ["OPTIONS"], ["TRACE"], ["PATCH"], ["ACL"], ["BASELINE-CONTROL"], ["BIND"], ["CHECKIN"], ["CHECKOUT"], ["COPY"], ["LABEL"], ["LINK"], ["LOCK"], ["MERGE"], ["MKACTIVITY"], ["MKCALENDAR"], ["MKCOL"], ["MKREDIRECTREF"], ["MKWORKSPACE"], ["MOVE"], ["ORDERPATCH"], ["PRI"], ["PROPFIND"], ["PROPPATCH"], ["REBIND"], ["REPORT"], ["SEARCH"], ["UNBIND"], ["UNCHECKOUT"], ["UNLINK"], ["UNLOCK"], ["UPDATE"], ["UPDATEREDIRECTREF"], ["VERSION-CONTROL"]],
+	"<METHOD>": [["GET"], ["HEAD"], ["POST"], ["OPTIONS"], ["CONNECT"], ["TRACE"], ["COPY"]],
 	
 	"<URI>": [["<SCHEME>" , ":", "<HIER>", "<QUERY>", "<FRAGMENT>"]],
 	
@@ -13,10 +13,14 @@
 	
 	"<AUTHORITY>": [["<USERINFO>", "<HOST>"]],
 
-	"<PATH>": [["/", "<DIR>"]],
+	"<PATH>": [["/", "<TESTDIR>"]],
+
+	"<TESTDIR>": [["test", "<TESTNUM>", "<TESTDIRTWO>"]],
+
+	"<TESTNUM>": [["1"],["2"],["3"],["4"],["5"],["6"],["7"],["8"],["9"],["10"],["11"],["12"],["13"],["14"],["15"],["16"],["17"],["18"],["19"],["20"],["21"],["22"]],
+
+	"<TESTDIRTWO>": [[], ["/", "<CHAR>", "<TESTDIRTWO>"]],
 
-	"<DIR>": [[], ["<CHAR>", "/", "<DIR>"]],
-	
 	"<USERINFO>": [[], ["<CHAR>", ":", "<CHAR>", "@"]],
 	
 	"<HOST>": [["127.0.0.1:8080"]],

$ make GRAMMAR_FILE=grammars/myhttp.json

$ ./grammar_generator-myhttp 100 1000 /root/fuzz/seeds ./trees
cp -r trees out/grammar-fuzzer
$ AFL_CUSTOM_MUTATOR_ONLY=1 AFL_CUSTOM_MUTATOR_LIBRARY=/root/AFLplusplus/custom_mutators/grammar_mutator/grammar_mutator/libgrammarmutator-myhttp.so AFL_PRELOAD=/root/preeny/x86_64-linux-gnu/desock.so afl-fuzz -m 128 -i seeds -o output/grammar-fuzzer webserver/usr/sbin/cherokee-worker -C cherokee.conf

Sanitizers

Target projects can be additionally compiled with sanitizers (ASAN, MSAN, UBSAN etc.). To do this it is enough to define a variable AFL_USE_ASAN=1 (or another appropriate) and rebuild the project.

According to the documentation, ASAN is not recommended on 64-bit systems without preparations. On Linux cgroups can be used to limit available memory.

# apt install cgroup-tools
# swapoff -a
# /root/AFLplusplus/utils/asan_cgroups/limit_memory.sh -m 50 -u user1 AFL_PRELOAD=/root/preeny/x86_64-linux-gnu/desock.so afl-fuzz -m none -i - -o ./output -- webserver/usr/sbin/cherokee-worker -C cherokee-hongg.conf

Also, fuzzing with ASAN may have huge (negative) influence on the performance.

In this case, speed dropped to 7 executions per seconds so I decided to not focus too much on this approach.

Coverage

afl-cov is a tool responsible for generating coverage reports. To use it, two things are required: output generated by AFL and the project built with the -fprofile-arcs -ftest-coverage compiler flags.

$ CFLAGS="-fprofile-arcs -ftest-coverage" ./configure --prefix=$(pwd)/usr --sysconfdir=$(pwd)/etc --localstatedir=$(pwd)/var

A coverage report can be generated using the following command (the coverage-cmd parameter takes a command which was used for running the target during the fuzzing process):

$ afl-cov -d output --coverage-cmd "LD_PRELOAD=/root/preeny/x86_64-linux-gnu/desock.so /root/fuzz/webserver-gcov/usr/sbin/cherokee-worker -C /root/fuzz/cherokee.conf" --code-dir /root/fuzz/webserver-gcov

I generated reports for all outputs:

hongfuzz - lines: 33.2%, functions: 41.1%
standard mutator - lines: 32.1%, functions: 40%
radamsa - lines: 30.1%, functions: 39.5%
grammar mutator - lines: 27.2%, functions: 37.1%

The best results were generated for hongfuzz, however, generally all mutators resulted in quite similar numbers.

Next steps

Generated reports showed that there should be room for improvement, so I prepared new test cases hoping to increase coverage. I did it in two ways, by:

configuring more handlers
enabling admininistration panel.

Enabling more handlers was pretty straightforward as it only required selecting different options. However, the administration panel required more preparations.

It is started from another binary and a unique password is generated and presented in a terminal. This password is used for digest authentication, thus two small changes in code were necessary to pass the authentication phase during the fuzzing. The first one was responsible for generating always the same password, and the second one for generating always the same nonce. With these changes, it became possible to replay requests between instances of the application.

diff --git a/cherokee/main_admin.c b/cherokee/main_admin.c
index eda501fd..8524082c 100644
--- a/cherokee/main_admin.c
+++ b/cherokee/main_admin.c
@@ -123,7 +123,7 @@ generate_password (cherokee_buffer_t *buf)
 
        for (i=0; i<PASSWORD_LEN; i++) {
                n = cherokee_random()%(sizeof(ALPHA_NUM)-1);
-               cherokee_buffer_add_char (buf, ALPHA_NUM[n]);
+               cherokee_buffer_add_char (buf, ALPHA_NUM[i]);
        }

diff --git a/cherokee/nonce.c b/cherokee/nonce.c
index ed880680..753cb78d 100644
--- a/cherokee/nonce.c
+++ b/cherokee/nonce.c
@@ -150,9 +150,9 @@ cherokee_nonce_table_generate (cherokee_nonce_table_t *nonces,
 
        /* Generate nonce string
         */
-       cherokee_buffer_add_ullong16(nonce, (cullong_t) cherokee_bogonow_now);
-       cherokee_buffer_add_ulong16 (nonce, (culong_t) rand());
-       cherokee_buffer_add_ulong16 (nonce, (culong_t) POINTER_TO_INT(conn));
+       cherokee_buffer_add_ullong16(nonce, (cullong_t) 0);
+       cherokee_buffer_add_ulong16 (nonce, (culong_t) 0);
+       cherokee_buffer_add_ulong16 (nonce, (culong_t) 0);

Final results

Further fuzzing revealed one bug in a non-default error handler. A crash happens when invalid (in terms of HTTP protocol) request in sent to the server. More details are available in the bug submission.

https://github.com/cherokee/webserver/issues/1270

Conclusion

Despite the fact that findings were not spectacular, I went through a process of setting up fuzzing for a network-based application, which is not natively supported by AFL. I applied different techniques to effectively speed up the fuzzing process, used different mutators and compared their effectiveness by generating coverage reports. All lessons learned from the process may be useful for next targets.