Evaluation methodology
Testbed
All the experiments were performed on the following setup:
Hardware
- Intel(R) Xeon(R) CPU E3-1230 v5 @ 3.40GHz
- 1 socket, 8 hyper-threads, 4 physical cores
- CPU caches: L1d = 32KB, L1i = 32KB, L2 = 256KB, shared L3 = 8MB
- 64 GB of memory
Network
For experiments on case studies, we used two machines with the network bandwidth between them equal to 938 Mbits/sec as measured by iperf.
Software infrastructure
- Kernel: 4.4.0
- GLibC: 2.21
- Binutils: 2.26.1
Compilers
- GCC:
- Version: 6.1.0
- Configuration flags:
--enable-languages=c,c++ --enable-libmpx --enable-multilib --with-system-zlib
- ICC:
- Version: 17.0.0
- Clang/LLVM (AddressSanitizer):
- Version: 3.8.0
- Configuration flags (LLVM):
-G "Unix Makefiles" -DCMAKE_BUILD_TYPE="Release" -DLLVM_TARGETS_TO_BUILD="X86"
- Clang/LLVM (SoftBound):
- Source
- Version: 3.4.0
- Configuration flags:
--enable-optimized --disable-bindings
Note that SoftBound provides spatial and temporal protection (MPX provides only spatial protection). This is the default behavior, controlled by a special macro flag. We could not change this flag to provide only spatial support (the question concerns version 3.8.0 but applies to 3.4.0).
- Clang/LLVM (SAFECode):
- Source
- Version: 3.2.0
- Configuration flags:
-G "Unix Makefiles" -DCMAKE_BUILD_TYPE="Release" -DLLVM_TARGETS_TO_BUILD="X86"
Measurement tools
We’ve used the following tools for measurements:
- perf stat. Our main tool used to measure all CPU-related parameters. Full list includes:
-e cycles,instructions,instructions:u,instructions:k
-e branch-instructions,branch-misses
-e dTLB-loads,dTLB-load-misses,dTLB-stores,dTLB-store-misses
-e L1-dcache-loads,L1-dcache-load-misses
-e L1-dcache-stores,L1-dcache-store-misses
-e LLC-loads,LLC-load-misses
-e LLC-store-misses,LLC-stores
Not to introduce additional measurement error, we measured these parameters in parts, 8 parameters at a time.
- time. Since
perf
does not provide capabilities for measuring physical memory consumption of a process, we usedtime --verbose
and collected maximum resident set size. - Intel Pin. To gather MPX instruction statistics, we developed a Pin tool. Full code of our instrumentation can be found in the repository.
Benchmarks
We used three benchmark suits in our evaluation: PARSEC 3.0, Phoenix 2.0 and SPEC CPU 2006. To remove some of the previously found bugs, we applied a patch to SPEC suite. Also, during our work, we found and fixed a set of bugs in them .
All the benchmarks were compiled against static libraries they depend upon (except raytrace
from PARSEC which requires dynamic X11 libraries).
Build types
GCC implementation of MPX
Compiler flags:
-fcheck-pointer-bounds -mmpx
Linker flags:
-lmpx -lmpxwrappers
Environment variables:
CHKP_RT_BNDPRESERVE="0" # support of legacy code, i.e. libraries
CHKP_RT_MODE="stop"
CHKP_RT_VERBOSE="0"
CHKP_RT_PRINT_SUMMARY="0"
Subtypes:
- disabled bounds narrowing:
-fno-chkp-narrow-bounds
- protecting only memory writes, not reads:
-fno-chkp-check-read
ICC implementation of MPX
Compiler flags:
-check-pointers-mpx=rw
Linker flags:
-lmpx
Environment variables:
CHKP_RT_BNDPRESERVE="0" # support of legacy code, i.e. libraries
CHKP_RT_MODE="stop"
CHKP_RT_VERBOSE="0"
CHKP_RT_PRINT_SUMMARY="0"
Subtypes:
- disabled bounds narrowing:
-no-check-pointers-narrowing
- protecting only memory writes, not reads:
-check-pointers-mpx=write
# instead of
-check-pointers-mpx=rw
AddressSanitizer (both GCC and Clang)
Compiler flags:
-fsanitize=address
Environment variables:
ASAN_OPTIONS="verbosity=0:\
detect_leaks=false:\
print_summary=true:\
halt_on_error=true:\
poison_heap=true:\
alloc_dealloc_mismatch=0:\
new_delete_type_mismatch=0"
Subtype:
- protecting only memory writes, not reads:
--param asan-instrument-reads=0
SoftBound
Compiler flags:
-fsoftboundcets -flto -fno-vectorize
As mentioned in the SoftBound GitHub repo, “LLVM/clang-3.4 introduces vectorization instructions […], SoftBoundCETS still does not handle these instructions. If you see false violations, use -fno-vectorize in your flags to avoid memory safety violations”.
Linker flags:
-lm -lrt
SAFECode
Compiler flags:
-fmemsafety -g -fmemsafety-terminate -stack-protector=1
Experiments
Each program was executed 10 times, and the results were averaged using arithmetic mean. The mean across different programs in the benchmark suite was calculated using geometric mean. Geometric mean was also used to calculate the “final” mean across three benchmark suites.
In case of Phoenix, each experiment was additionally preceded by a “dry run” - a run that was not recorded and served a sole purpose of putting the working set into the OS I/O cache. The goal of this “dry run” was to decrease the variance in the results, since all Phoenix benchmarks are small and “cold” cache might have drastically slowed them down.
We performed the following types of experiments:
- normal: experiments on a single thread (serialized) and with fixed input
- multithreaded: experiments on 2, 4, and 8 threads
- variable inputs: experiments with increasing input size (5 runs, each next one with an input twice bigger than the previous)
The results were checked to fulfill the following criteria:
- application compiled successfully
- application run successfully (with zero exit code)
- the output is equal to the output of non-protected application (if it is deterministic)
Values of coefficient of variation (CV) are presented in the following table:
Experiment | Average CV, % | Maximum CV, % |
---|---|---|
Phoenix | 0.34 | 3.87 |
PARSEC | 0.28 | 3.75 |
SPEC | 0.41 | 3.96 |
All | 0.35 | 3.96 |