Early Concolic Testing of Embedded Binaries with Virtual Prototypes: A RISC-V Case Study

Vladimir Herdt1  Daniel Große1,2  Hoang M. Le1  Rolf Drechsler1,2

1Institute of Computer Science, University of Bremen, 28359 Bremen, Germany
2Cyber-Physical Systems, DFKI GmbH, 28359 Bremen, Germany
{vherdt, grosse, hle, drechsle}@informatik.uni-bremen.de

ABSTRACT

Extensive testing of IoT SW is very important to prevent errors and security vulnerabilities. In the SW domain the automated concolic testing technique has been shown very effective.

In this paper we propose an approach for concolic testing of binaries targeting RISC-V systems with peripherals. Our approach works by integrating the Concolic Testing Engine (CTE) with the architecture specific Instruction Set Simulator (ISS) inside of a Virtual Prototype (VP). We provide a designated CTE-interface to integrate (SystemC-based) peripherals into the concolic testing by means of SW models. This combination enables a high simulation performance at binary level with comparatively little effort to integrate peripherals with concolic execution capabilities. Our approach has been effective in finding several buffer overflow related security vulnerabilities in the FreeRTOS TCP/IP stack.

1. INTRODUCTION

All predictions agree that the momentum already gained by the Internet-of-Things (IoTs) will not stop, quite the contrary, i.e., a paramount growth in both, the number of connected devices and the size of the potential business is expected. From the development perspective of IoT devices several challenges arise which often manifest in conflicting requirements: high performance, low power, correctness, remote accessibility, security and reliability, to just name a few. Among these, correctness and security are of fundamental significance. Hence, extensive testing of IoT SW is very important to prevent errors and security vulnerabilities.

In the SW domain the automated concolic testing technique has been shown very effective. Essentially, concolic testing successively explores new paths through the SW program by solving symbolic constraints, that are tracked alongside the concrete execution. This combination of concrete with symbolic execution enables an efficient exploration of a large set of different program paths.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org.

DAC ’19, June 2-6, 2019, Las Vegas, NV, USA
© 2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-6725-7/19/06...$15.00
https://doi.org/10.1145/3316781.3317807

A full IoT solution consists of SW and HW. The traction coming from open source SW recently reached open source HW. In particular, RISC-V has started to become a game changer for IoT processors. RISC-V is an open and free Instruction Set Architecture (ISA) [23]. Since 2015 the RISC-V ISA standard is maintained by the non-profit RISC-V Foundation [3] which has more than 200 members aiming innovation. However, to the best of the authors knowledge, concolic testing of binaries targeting RISC-V systems with peripherals has not been considered yet.

Contribution: In this paper we propose such a concolic testing approach for binaries targeting RISC-V systems with peripherals. Our approach works by integrating the Concolic Testing Engine (CTE) with the architecture specific Instruction Set Simulator (ISS) inside of a Virtual Prototype (VP). We provide a designated CTE-interface to integrate additional peripherals through a SW-library that is linked with the RISC-V binary under test into a combined RISC-V binary. Our CTE-interface consists of a small set of interface functions, tailored for SystemC-based peripherals with TLM 2.0 communication [12,13]. Our approach enables a high simulation performance, by tightly integrating the concolic execution engine with the VPs ISS for the specific target ISA, and at the same time significantly reduces the implementation effort in adding concolic execution capabilities to each peripheral (since peripherals are executed as SW on top of the VP and hence inherit the concolic execution capabilities of the VP). Our experiments demonstrate the efficiency and applicability of our approach in analyzing real-world embedded applications. We found several buffer overflow related security vulnerabilities in the FreeRTOS TCP/IP stack.

Related Work: Most of existing work on concolic testing and symbolic execution has been focusing on testing non-embedded SW that does not or only rarely interacts with HW, KLEE [6] and SAGE [11] are among the pioneering work that has made the techniques really work for real-world SW. While KLEE operates on the LLVM intermediate representation and requires the source code, SAGE works directly on the x86 binary. Operating on the binary level is important to achieve accurate verification results, as binaries are code that will actually be deployed. Thus, despite being more difficult due to the complexity of lower-level constructs, concolic testing of binaries (mostly x86 and ARM to some extent) is increasingly being considered by subsequent work, e.g. S2E [8], Mayhem [7] or Angr [22]. More recently, concolic/symbolic testing has gained attention from the HW verification community, see e.g. [21] for RTL and [14,15,19] for SystemC VPs.

For embedded SW, these approaches are not sufficient due to the much stronger dependence of the SW on the underlying HW. This gave rise to a number of specialized HW/SW symbolic (concolic) co-validation approaches that are related to ours. They mainly differ on how the underlying HW is being integrated. [5,6] use virtual peripheral models manually extracted from QEMU, [20] on the other hand integrates HW Verilog

* This work was supported in part by the German Federal Ministry of Education and Research (BMBF) within the project CONFIRM under contract no. 16ES0565 and within the project SATiSFy under contract no. 16KIS0821K, by the University of Bremen’s Central Research Development Fund, and by the University of Bremen’s graduate school SyDe, funded by the German Excellence Initiative.
models. Recently, [17] proposed a new formalism called instruction level abstraction to formally model SW-visible behavior of HW. This enables more scalable symbolic exploration but it is unclear whether the abstraction can be fully automated. These approaches operate at the source-code level. FIE [10] is more similar to ours in the sense that it targets SW running on a specific HW platform. FIE brings a set of modifications to KLEE to resemble an MSP430-based execution environment and optimizations for more scalable concolic testing (e.g. memory smudging and state pruning). However, it still requires the SW source code and cannot handle inline assembly instructions that often can be found in low-level embedded SW. Therefore, Inception [9] introduces an assembly to LLVM-IR lifting approach. AVATAR [25] extends S2E to allow hybrid binary concolic testing with physical devices.

Note that while we describe our approach in some details, we do not claim the concolic testing technique to be our central contribution. Rather it is the combination on how the concolic execution engine is integrated with the VP with our method for integration of (SystemC-based) peripherals. By this means we provide a practical framework filling the gap of concolic testing for embedded RISC-V binaries that is not possible with any existing framework. Conceivably, Angr - operating on the VEX intermediate representation at binary level – can be extended to support RISC-V instructions. However, due to its focus on non-embedded SW, extending Angr to the same extent of capability of handling RISC-V binaries and peripherals would require a lot of effort. Furthermore, based on its promises, RISC-V deserves a tailored platform that is more lightweight and amenable to specific optimizations.

2. PRELIMINARIES

2.1 RISC-V

RISC-V is an open and free Instruction Set Architecture (ISA). The ISA consists of a mandatory base integer instruction set (denoted RV32I, RV64I or RV128I with corresponding register widths) and various optional extensions denoted as single letters, e.g. M (integer multiplication and division), C (compressed instructions), etc. Thus, RV32IMC denotes a 32 bit core with M and C extension. The instruction set is very compact, consisting of three parts:

- **Concurrent Programming**: The ISA defines a set of concurrent instructions that allow for parallel execution of threads.

- **Control Flow**: The ISA includes a rich set of control flow instructions, enabling complex branching and conditional execution.

- **Data Management**: The ISA provides a simple yet powerful memory model, facilitating efficient data management.

RISC-V is an open and free Instruction Set Architecture (ISA). The ISA consists of a mandatory base integer instruction set (denoted RV32I, RV64I or RV128I with corresponding register widths) and various optional extensions denoted as single letters, e.g. M (integer multiplication and division), C (compressed instructions), etc. Thus, RV32IMC denotes a 32 bit core with M and C extension. The instruction set is very compact, consisting of three parts:

- **Concurrent Programming**: The ISA defines a set of concurrent instructions that allow for parallel execution of threads.

- **Control Flow**: The ISA includes a rich set of control flow instructions, enabling complex branching and conditional execution.

- **Data Management**: The ISA provides a simple yet powerful memory model, facilitating efficient data management.

2.2 Concolic Testing

**Overview**: Concolic testing is a technique to successively explore new paths through the SW program. It uses concolic values in place of pure concrete ones. A concolic value is a pair with a concrete N and a symbolic part x written as (N,x). The concrete part is always available. We denote a concolic value to be symbolic, if the symbolic part x is also available. Otherwise (x = /) we call it concrete.

Concolic testing, input variables (or memory regions in general) are marked to be symbolic. Furthermore, a (symbolic) Execution Path Condition (EPC) and a list of (symbolic) Trace Conditions (TCs) are tracked. After each execution, an SMT solver checks each emitted TC for satisfiability. Each satisfiable TC represents a new testcase (input). An input assigns concrete values to symbolic variables (i.e. memory regions), thus preventing them being assigned random values and guiding exploration towards a specific path. Concolic testing starts by (concretely) executing a random path through the SW program (i.e. all variables marked symbolic are simply assigned random values). During SW execution the symbolic input constraints are propagated alongside the concrete execution. Let us consider an example operation combining two operands A and B. In case one operand A = (1,x) has a symbolic value and the other B = (2,/) has not

3. CONCOLIC TESTING ENGINE (CTE) FOR RISC-V EMBEDDED BINARIES

CTE enables concolic testing of binaries targeting RISC-V systems with peripherals. We start with an overview of CTE. Then, we present our approach on integrating peripherals into CTE in more details and finally show an example that illustrates how CTE interoperates with SW in combination with peripherals.

3.1 Overview

Fig. 1 shows an overview on the architecture of CTE. Essentially, CTE consists of three parts:

1) A SW-library that contains CTE-interface functions and a set of peripheral models. The CTE functions allow to declare and reason about symbolic variables and enable CTE to interface with peripheral models.

2) A VP that is operating with concolic, instead of concrete, data types. The VP essentially consists of an ISS that executes instructions one after another and a memory that stores code and data. Both, ISS and memory use concolic data types and propagate symbolic constraints during program execution. The ISS supports the RISC-V RV32IMC ISA. Additional peripherals are integrated into the simulation through the CTE SW-library.

3) A concolic exploration engine that is successively exploring new paths through the SW program (binary) by leveraging the VP and solving constraints on symbolic variables using an SMT solver.

3.1.1 Initialization and Exploration

CTE of SW works as follows: First, the SW is compiled together with the CTE SW-library into a RISC-V ELF (binary file). Variables, representing input data, are marked to be symbolic (using the CTE-interface functions). Then, the (concolic execution) VP is instantiated and the (compressed) RISC-V ELF is loaded into the VPs memory. The exploration engine then starts by assigning each symbolic variable a random value (empty input) and then successively generates new inputs (based on the observed constraints) to explore different paths through the SW program. In particular it will try to generate new inputs in case of a symbolic branch condition or assume or assert function. The VP is cloned

Figure 1: Concolic Testing Engine (CTE) architecture overview

B will be first converted to (2,2), where 2s is an SMT expression representing the constant value 2. Thus, for example A + B would then result in (3, x + 2s). In the following we often omit the S suffix in case the context is clear and write 2s simply as 2.

At each branch instruction with symbolic condition x = (c, x), the path condition EPC is extended and a trace condition TC is emitted, either: 1) TC = EPC ∧¬ a and then EPC := EPC ∧ a in case the branch is taken (c evaluates to true in the concrete execution), or 2) TC = EPC ∧ a and then EPC := EPC ∧¬ a, otherwise. For verification purposes, assume(s) adds x to the current EPC to prune irrelevant paths and emits a TC in case c is false. Assert(s) checks c and emits a TC := EPC ∧¬ a in case c is true. Then EPC is extended with x.

**Concretization**: Symbolic values (N,x) can be concretized, i.e. converted to concrete values (N/,), by adding the constraint N = x to the path condition EPC. This can be useful in case parts of the simulation only support concrete values. Optionally, trace conditions can be emitted to (try) generate different concrete values N (e.g. the minimum and maximum possible values would be good candidates).
For illustration of our peripheral modeling concepts, Fig. 2 shows a simple sensor peripheral. It provides three (memory mapped) 32 bit registers: `data` holds the most recent generated sensor data, `scaler` controls how fast new sensor data is generated, and `filter` is applied on the generated data. Register access is provided through the `transport` function, which roughly corresponds to a TLM `b_transport` function. New sensor data is periodically generated by the `update` function (Line 22). Therefore, the data register is overwritten with a new symbolic value. Line 10 and constrained to stay in the sensor range. An interrupt is triggered by calling the `PLIC` peripheral (a RISC-V specific Platform Level Interrupt Controller, implementation not shown) in Line 15. PLIC will prioritize incoming interrupts and eventually notify the ISS using the `CTE_notify` function. In case a symbolic argument is used, it will be concatenated. `CTE_return` returns execution control back to the SW program, which is interrupted in order to enter peripheral functions (transport and update). We present more details on this in the following (sub-)section. Finally, we provide the `CTE_get_cycles` function to obtain the current number of simulation cycles from the ISS to model the RISC-V specific `CLINT` (Core Local INTerrupt) peripheral. `CLINT` triggers periodic (configurable) timer interrupts and allows the SW to obtain the current simulation time.

### 3.2 Peripheral Modeling Concepts

For illustration of our peripheral modeling concepts, Fig. 2 shows a simple sensor peripheral. It provides three (memory mapped) 32 bit registers:

```c
1 static uint32_t scaler = 25;
2 static uint32_t filter = 0;
3 static uint32_t data = 0;
4 #define SCALE_REG_ADDR 0x00
5 #define FILTER_REG_ADDR 0x04
6 #define DATA_REG_ADDR 0x08
7
8 void update() {
9 // overwrite data with new concolic bytes
10 CTE_make_symbolic(&data, sizeof(data), "d");
11 CTE_assume(data >= MIN_SENSOR_VALUE && data <= MAX_SENSOR_VALUE);
12 data -= filter;
13 // PLIC receives interrupts, priorities them and calls a CTE-function to notify the VP
14 pllc_process_interrupt(2 /*IRQ_NUMBER*/);
15 // corresponds to a simple thread wait in SystemC
16 // or just a method process
17 CTE_notify(NULL, scaler*CYCLES_PER_MS);
18 CTE_return();
19 }
20
21 // corresponds to a simple TLM transaction
22 void transport(uint32_t addr, uint8_t *data,
23 uint32_t size, _Bool is_read) {
24 CTE_assert (size == 4); //only access whole reg.
25 uint32_t ptr = (uint32_t *)data;
26 uint32_t *reg = 0;
27 // pre-process actions
28 if (addr == SCALE_REG_ADDR) {
29 if (!is_read)
30 CTE_notify(ptr, *ptr*CYCLES_PER_MS);
31 reg = &scaler;
32 } else if (addr == DATA_REG_ADDR) {
33 reg = &data;
34 } else if (addr == FILTER_REG_ADDR) {
35 reg = &filter;
36 } else {
37 reg = &filter;
38 if (is_read) *ptr = *reg;
39 else *reg = *ptr;
40 }
41 // post-process actions
42 if (filter >= MIN_SENSOR_VALUE)
43 if (filter >= MIN_SENSOR_VALUE)
44 CTE_trigger_irq(CLINT, 10);
45 return (0 && "invalid addr");
46 CTE_return();
47
48 }
```

Each time before executing a new input to preserve the initial VP state and allow exploration of different paths. CTE continues until all inputs have been processed or a runtime check fails. CTE checks for SW assertion violations as well as some generic error sources: null pointer dereference, access (read or write) of an illegal memory address and invalid jump target (either not properly aligned or a jump to an illegal instruction). In our experimental evaluation we also check for overflows of heap allocated memory in FreeRTOS (more details will follow later).

### 3.1.2 Peripheral Integration

To implement CTE for embedded binaries (targeting a specific ISA) with peripherals, there are in general two fundamental choices with different trade-offs between execution performance and implementation effort: 1) Integrate concolic execution into every component of the VP, i.e. including every peripheral. This fully specialized solution requires significant effort (for each peripheral) but leads to the best performance. 2) Use a generic (x86) symbolic/concolic execution engine like S2E and then calls the real transport function. An optional delay parameter could also be passed in and returned back to the VP through the CTE-interface.
instruction, is matched against the address ranges and then routed accordingly. This step involves a global-to-local address translation. For example executing a RISC-V SW (store word) instruction with address 0x10000004 (Line 14 in Fig. 3), and assuming the sensor is mapped to e.g. the range (0x10000000, 0x1000ffff), then the access address is 0x10000004. 

Until the \texttt{sensor_has_data} is generated by the branch execution in Line 44, due to the memory access (Line 12) and an interrupt waiting loop, fetch the data register value (Line 20) and check the assertion in Line 21. The main SW will leave the interrupt waiting loop, fetch the data register value (Line 20) and check the assertion in Line 21. The main SW waits for an interrupt (Line 18). Thus, the \texttt{assume} function evaluates to false and this path ends.

Both trace conditions are satisfiable, resulting in e.g. the two new inputs \(I_1 = \{f_0 = 1, d_0 = 25\}\) and \(I_2 = \{f_0 = 23, d_0 = 45\}\) (obtained by using the SMT solver), respectively. It depends on the search strategy of the exploration engine, which input is considered next. For this example, we continue with \(I_2\). Until the \texttt{assume} function (Line 11) it follows the same path as the initial execution. However, different concrete values are assigned at the \texttt{CTE_make_symbolic} functions based on \(I_2\). Thus, this time the \texttt{assume} function evaluates to true and execution continues. Please note, no trace conditions have been generated until now to avoid re-exploration of the first execution path (and its descendants, i.e. \(I_1\)). Next, \texttt{filter} is applied to the generated data (Line 12) and an interrupt is triggered (Line 13). The main SW will leave the interrupt waiting loop, fetch the data register value (Line 20) and check the assertion in Line 21. The concrete part of the condition evaluates to true, the \texttt{assert} generates a trace condition \(T_{C1} = (f_0 \geq 16) \land (d_0 \leq 16)\) and this path ends. \(T_{C1}\) is satisfiable with e.g. \(I_3 = \{f_0 = 23, d_0 = 16\}\). With \(I_3\) the data register in Line 12 will underflow and thus the \texttt{assert} in Line 21 is violated. This is due to an incorrect filter value assigned in Line 45. It should use \texttt{minus one instead of plus one}. This example also shows that it is important to integrate the peripheral logic into the testing approach to avoid missing relevant behavior of the overall system.

4. EXPERIMENTS

We have implemented our proposed CTE for concolic testing of binaries targeting RISC-V systems with peripherals. As symbolic backend we use KLEE v1.4.0 with STP solver v2.3.1. We evaluate CTE in two steps: First, we evaluate the performance of our approach in Section 4.1. Then, we apply CTE to test the FreeRTOS TCP/IP stack. Our approach has been effective in finding various security vulnerabilities.

4.1 Performance Evaluation

We evaluate the performance of our CTE by comparing against: 1) a generic symbolic execution solution using S2E, and 2) a VP supporting RISC-V binaries [13]. The ISS in VP is similar to that in CTE but only supports concrete execution. Hence, VP uses DMI for memory access and native (C++ uint32_t) data types for instruction execution (instead of a custom concolic data type that propagates a concrete and symbolic value for each instruction). Furthermore, VP is using normal SystemC peripherals. For the S2E solution, we run VP simulating the RISC-V binary on top of S2E, thus allowing symbolic execution of RISC-V binaries with little implementation overhead (only a small interface layer is required to propagate the \texttt{make_symbolic}, \texttt{assume} and \texttt{assert} functions from the RISC-V binary through the VP into S2E).

Table 1 shows the results. All runtimes are provided in seconds. It is separated into 11 columns. The columns show, in order from left to right,
the benchmark name (column: Benchmark), number of instructions executed (column: #instr), number of lines of code (LOC) in C and RISC-V assembly (columns: C and ASM), the simulation time in seconds on the VP, S2E and our CTE (columns: VP, S2E and CTE), and relevant statistics for our CTE: factor of improvement (FoI) compared to S2E (column: FoI S2E), time in seconds spent with solver queries (column: stime), number of different paths explored (column: #paths) and number of solver queries (column: #queries).

Table 1 is separated into two halves. The upper half shows benchmarks that only operate on concrete values, while the lower half shows benchmarks including symbolic values (hence exploring multiple execution paths). For symbolic benchmarks, the reported number of instructions is combined over all paths. As benchmarks we use qsort from the newlib C library, a standard dhrystone implementation, sha512 performs a checksum computation, counter involves counting related constraints, fibonacci uses a recursive implementation (function call intensive) and freertos-sensor embeds a sensor application into FreeRTOS tasks. The /s name suffix denotes that the benchmark uses symbolic values.

Comparing our CTE approach with VP, it can be observed that the overhead of using concyclic execution is around a factor of 2.2x. The overhead of running the VP inside of S2E is between 21x and 49x compared to CTE on our benchmark set. The main reason is the additional interpretation overhead in the ISS component of the VP. With symbolic values, the overhead of S2E compared to CTE is even more pronounced. Speedups of multiple orders of magnitude can be observed. The (symbolic) state space representation is more heavy-weight, and relevant statistics for our CTE: factor of improvement (FoI) compared to S2E (column: FoI S2E), time in seconds spent with solver queries (column: stime), number of different paths explored (column: #paths) and number of solver queries (column: #queries).

Table 1: Experiment results – timeout (T.O.) set to 7200 seconds (2 hours)

<table>
<thead>
<tr>
<th>Benchmark</th>
<th>LOC C</th>
<th>LOC ASM</th>
<th>Sim. Time (sec.)</th>
<th>FoI S2E</th>
<th>CTE</th>
<th>#paths</th>
<th>#queries</th>
</tr>
</thead>
<tbody>
<tr>
<td>qsort</td>
<td>52,343,639</td>
<td>212</td>
<td>1126</td>
<td>1.83</td>
<td>122.53</td>
<td>3.87</td>
<td>31.6x</td>
</tr>
<tr>
<td>sha512</td>
<td>75,997,581</td>
<td>175</td>
<td>1132</td>
<td>2.71</td>
<td>136.90</td>
<td>4.62</td>
<td>29.6x</td>
</tr>
<tr>
<td>dhrystone</td>
<td>238,000,584</td>
<td>273</td>
<td>515</td>
<td>8.42</td>
<td>421.60</td>
<td>19.70</td>
<td>21.4x</td>
</tr>
<tr>
<td>freertos-sensor</td>
<td>21,348,342</td>
<td>217</td>
<td>1820</td>
<td>1.35</td>
<td>107.37</td>
<td>2.19</td>
<td>49.0x</td>
</tr>
<tr>
<td>counter/s</td>
<td>642,710</td>
<td>71</td>
<td>31</td>
<td>1.79</td>
<td>1264.21</td>
<td>41.74</td>
<td>30.3x</td>
</tr>
<tr>
<td>fibonacci/s</td>
<td>683,970</td>
<td>37</td>
<td>170</td>
<td>1.90</td>
<td>394.14</td>
<td>1.90</td>
<td>197.5x</td>
</tr>
<tr>
<td>qsort/s</td>
<td>58,648</td>
<td>219</td>
<td>1192</td>
<td>1.90</td>
<td>T.O</td>
<td>32.90</td>
<td>&gt;218x</td>
</tr>
<tr>
<td>freertos-sensor/s</td>
<td>301,697,668</td>
<td>241</td>
<td>18329</td>
<td>1.35</td>
<td>422.78</td>
<td>38.66</td>
<td>75,997,581</td>
</tr>
</tbody>
</table>

Essentially, to receive a packet, the driver waits for a network card interrupt. Then, it asks the network card for the number of received bytes (ReceiveSize function), allocates a buffer on the heap large enough to hold ReceiveSize bytes, and asks the network card to store the data in the buffer (ReceiveData function). Finally, the allocated buffer is delivered to the (platform independent) IP-task for further processing. We model the network card as a peripheral (as described Section 4.2). Inside the peripheral we store a packet buffer of 512 bytes with symbolic content. Furthermore, we define a symbolic integer variable N and add the assumption N ≤ 512 in the peripheral. The driver glue functions access our peripheral using memory mapped I/O. Our model simply ignores outgoing packets, returns N when asked for ReceiveSize and copies N bytes into the data pointer provided by ReceiveData.

In the (SW) main function, we create and initialize the FreeRTOS network (IP-task) and driver processing task and start the FreeRTOS scheduler (available from the FreeRTOS kernel). Then we create and bind a TCP socket in listening mode to ensure that the IP-task does not drop TCP packets (which is the case if no initialized TCP socket is available). Finally, we start the actual symbolic testing by calling the init function of our peripheral, which in turn will trigger an interrupt notifying the driver to receive and process a packet. Interrupts are processed using our PLIC peripheral. Timer interrupts for the FreeRTOS scheduler processing are generated using our CLINT peripheral.

Please note, the IP-task is non-terminating and will wait for new packets from the driver indefinitely. Thus, we added a switch to stop simulation after one packet has been processed, to allow CTE to explore other paths as well (alternatively we could bound the search depth in CTE).

4.2.2 Heap Buffer Overflow Detection

To check for heap buffer overflows, we provide wrappers for the FreeRTOS memory management functions pvPortMalloc (allocate memory) and vPortFree (free memory). We use the linker options -Wl,-wrap=pvPortMalloc -Wl,-wrap=vPortFree with GCC to automatically redirect all accesses to pvPortMalloc to our wrapper function wrap_pvPortMalloc. The original function is available as __real_pvPortMalloc (vPortFree handled in same way).

Fig. 5 shows our wrapper functions. The pvPortMalloc wrapper allocates a larger than requested memory block, by adding additional bytes (protected zones) before and after the requested memory block. These protected zones are registered in CTE (Line 5). CTE will monitor all load/store operations and trigger a simulation error in case of a write (or read) access inside of the zones. CTE generates trace conditions in case of a symbolic memory address. The vPortFree wrapper unregisters the two corresponding protected zones from CTE (Line 15), which does also check for double free and non-allocated blocks, and then calls the real vPortFree function of FreeRTOS.

4.2.3 Test Results

We run CTE until we find the first error. We then repeatedly fix the error and re-run CTE, until no more error is found. We have set 4 hours (14400 seconds) as time limit to terminate the analysis. Table 2 presents the errors that we have found. All errors are related to buffer overflows of heap allocated memory and can therefore lead to serious security vulnerabilities. Besides a description, Table 2 shows additional relevant information similar to Table 1.

### Table 1: Experiment results – timeout (T.O.) set to 7200 seconds (2 hours)

<table>
<thead>
<tr>
<th>Benchmark</th>
<th>LOC C</th>
<th>LOC ASM</th>
<th>Sim. Time (sec.)</th>
<th>FoI S2E</th>
<th>CTE</th>
<th>#paths</th>
<th>#queries</th>
</tr>
</thead>
<tbody>
<tr>
<td>qsort</td>
<td>52,343,639</td>
<td>212</td>
<td>1126</td>
<td>1.83</td>
<td>122.53</td>
<td>3.87</td>
<td>31.6x</td>
</tr>
<tr>
<td>sha512</td>
<td>75,997,581</td>
<td>175</td>
<td>1132</td>
<td>2.71</td>
<td>136.90</td>
<td>4.62</td>
<td>29.6x</td>
</tr>
<tr>
<td>dhrystone</td>
<td>238,000,584</td>
<td>273</td>
<td>515</td>
<td>8.42</td>
<td>421.60</td>
<td>19.70</td>
<td>21.4x</td>
</tr>
<tr>
<td>freertos-sensor</td>
<td>21,348,342</td>
<td>217</td>
<td>1820</td>
<td>1.35</td>
<td>107.37</td>
<td>2.19</td>
<td>49.0x</td>
</tr>
</tbody>
</table>

4.2.1 Test Setup

IP packets are processed in FreeRTOS inside the platform independent IP-task. The IP-task requires a driver to receive (send) packets from (to) the network card. The FreeRTOS documentation provides a generic porting guide [2] to create a new driver and connect it with the IP-task. Using the generic driver code, it is only necessary to implement three additional glue functions: 1) ReceiveSize, 2) ReceiveData, and 3) SendData.
It can be observed that our concolic analysis, despite currently being without sophisticated state-of-the-art search heuristics, is already very effective in finding errors. Some of the detected errors require very specific inputs that are hard to find without formal methods. Note that these errors/vulnerabilities were present within the FreeRTOS TCP/IP stack for a long time already. At the time of writing, it appears that other researchers might also have discovered these errors [1]. These have been fixed only very recently in the newest FreeRTOS Version (v10.1.0). Despite being assigned CVEs (detailed list in [1]), neither the errors nor how they have been found are disclosed.

5. CONCLUSION

In this paper we proposed an approach for concolic testing of binaries targeting RISC-V systems with peripherals. We embed the Concolic Testing Engine (CTE) into the core VP and integrate peripherals through a SW-library. This enables high execution performance by specializing the ISS for the target architecture and at the same time significantly reduces implementation effort in adding concolic execution capabilities to each peripheral. Our approach has been very effective in finding buffer overflow related errors in the FreeRTOS TCP/IP stack, which demonstrates the applicability and efficiency of our approach in analyzing real-world embedded applications. For future work we plan to: 1) Investigate using C++ peripheral models with a more comprehensive abstraction layer to avoid the current peripheral transformation step. 2) Integrate support for timers/interrupts with symbolic notification times to enable checking for timing/ordering related errors. 3) Evaluate different search heuristics to select inputs that increase code coverage more quickly.

6. REFERENCES