Fuzz-testing Ontology's NeoVM Execution Engine

in #ontology6 years ago
  • Two smart contract fragments were identified that cause panics in the NeoVM implementation in the Ontology blockchain code.

Introduction

Ontology is a “distributed trust collaboration platform”, a blockchain that supports identity management and smart contracts. Ontology smart contracts can be written in a variety of languages using an online IDE. It currently supports two different virtual machine implementations--- WASM (Web Assembly) and NeoVM, a virtual machine created for the NEO blockchain.

NeoVM is a stack-oriented virtual machine with single-byte opcodes. Unfortunately we could locate few details of its design, other than the opcode listing included in the source. The implementation in NEO is in C#; the version included in Ontology is a reimplementation. In the future we might use fuzzing to compare the behavior of the two implementations, but for this article we looked just at the implementation’s robustness.

Ontology’s implementation is written in Go, so we can demonstrate a fuzzing tool for Go, go-fuzz. (https://github.com/dvyukov/go-fuzz). go-fuzz has a long list of bugs in its “trophy case”, and credits American Fuzzy Lop for the design of its fuzzing logic.

Building a test harness for fuzzing

Fortunately, Ontology already contains a unit test which does fuzzing. It generates random opcodes and attempts to execute them.
See https://github.com/ontio/ontology/blob/1e5cdc4ca0d099444eb7ea68b89534f1f353bfc4/smartcontract/test/panic_test.go#L37

Each unit test run does 90 tests, 10 random strings of 1 to 9 bytes each. It is easy to adapt that to go-fuzz’s requirements.

Go’s dependency management, and go-fuzz’s compiler, makes it much easier to write a test harness even when there are complicated dependencies to get a particular piece of code to compile.

// +build gofuzz

package test

import (
    "github.com/ontio/ontology/smartcontract"
    "github.com/ontio/ontology/core/types"
)

func Fuzz( data []byte ) int {
    config := &smartcontract.Config{
        Time:   10,
        Height: 10,
        Tx:     &types.Transaction{},

    }
    
    sc := smartcontract.SmartContract{
        Config:  config,
        Gas:     100000,
        CacheDB: nil,
    }
    engine, err := sc.NewExecuteEngine( data );
    _, err = engine.Invoke()

    if err != nil {
        return 1
    } else {
        return 0
    }   
}

go-fuzz requests that the fuzzing function return 1 for “interesting” test cases, such as examples that parse correctly, and 0 for “uninteresting” cases. If the file above is placed in the “test” subdirectory, then

go-fuzz-build github.com/ontio/ontology/smartcontract/test

builds a binary for fuzzing.

Fuzzing the virtual machine

Like AFL, go-fuzz needs a corpus of example to bootstrap its search, and other unit tests provided suitable example strings of valid code. However, it does not accept a dictionary so I could not give it hints about the valid tokens and interesting strings such as function names.

go-fuzz also launches multiple execution threads by default (which is a somewhat more cumbersome process with AFL.) Its output looks like this:

2018/12/14 20:07:10 workers: 2, corpus: 1077 (17m29s ago), crashers: 5, restarts: 1/139, execs: 57731757 (836/sec), cover: 3168, uptime: 19h10m
2018/12/14 20:07:13 workers: 2, corpus: 1077 (17m32s ago), crashers: 5, restarts: 1/139, execs: 57731777 (836/sec), cover: 3168, uptime: 19h11m
2018/12/14 20:07:16 workers: 2, corpus: 1077 (17m35s ago), crashers: 5, restarts: 1/139, execs: 57731778 (836/sec), cover: 3168, uptime: 19h11m

The fuzzer found 1077 test cases, which cover about 3168 transitions between basic blocks in the code. Five of its test cases crash.

One of these accesses the CacheDB, which is nil. Earlier versions of the original unit test used a real backing store, but there have been interface changes made recently in this area, so it proved difficult to add a CacheDB instance. Using the real backing store was extremely heavyweight and reduced the execution speed to well below 100/second. go-fuzz reported this failure only once, but the high restart rate suggests we could have been more efficient by avoiding this code path.

Two of the others seem like legitimate failures, and are reproducible in a unit test

Underflow bug

    code = []byte( "\x00\x00|v\x83\x83d\xfb\xff" )
           0x00 0x00 0x7c 0x76 0x83 0x83 0x64 0xfb 0xff
 PUSH0, PUSH0, SWAP, DUP, INVERT, INVERT, JMPIFNOT, invalid?, invalid?

This “contract” causes an underflow in the math library:

panic: underflow [recovered]
    panic: underflow

goroutine 5 [running]:
testing.tRunner.func1(0xc0000aa400)
    /usr/local/go/src/testing/testing.go:792 +0x387
panic(0x8086c0, 0x90e340)
    /usr/local/go/src/runtime/panic.go:513 +0x1b9
math/big.nat.sub(0xc000026450, 0x1, 0x5, 0xc000026450, 0x1, 0x5, 0xbd7500, 0x1, 0x1, 0xc00000cde0, ...)
    /usr/local/go/src/math/big/nat.go:142 +0x334
math/big.(*Int).Not(0xc00000cde0, 0xc00000cde0, 0x0)
    /usr/local/go/src/math/big/int.go:1120 +0x92
github.com/ontio/ontology/vm/neovm.opInvert(0xc0000c0b60, 0x0, 0x0, 0x851380)
    .../ontology/vm/neovm/func_bitwise.go:26 +0x65
github.com/ontio/ontology/vm/neovm.(*ExecutionEngine).ExecuteOp(0xc0000c0b60, 0x859500, 0x89ef11, 0x6)
    .../ontology/vm/neovm/execution_engine.go:119 +0x52
github.com/ontio/ontology/vm/neovm.(*ExecutionEngine).StepInto(0xc0000c0b60, 0x1, 0x1)
    .../ontology/vm/neovm/execution_engine.go:96 +0x2b
github.com/ontio/ontology/smartcontract/service/neovm.(*NeoVmService).Invoke(0xc0000aec80, 0xc000022770, 0x9, 0x10, 0x9103c0)
    .../ontology/smartcontract/service/neovm/neovm_service.go:243 +0x798
command-line-arguments.TestCrashes(0xc0000aa400)
    .../ontology/smartcontract/test/panic_test.go:85 +0x184
testing.tRunner(0xc0000aa400, 0x8bb558)
    /usr/local/go/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
    /usr/local/go/src/testing/testing.go:878 +0x353
FAIL    command-line-arguments  0.008s

We must admit that we don’t fully understand the behavior here. INVERT of 0 results in a large negative number, and attending to apply Not to that again results in underflow. But the three remaining bytes in the contract are necessary to exhibit the bug; the unit test does not panic without them.

Slice Bound violation

    code = []byte( "_\x00s" )
           0x5f 0x0 0x73
    PUSH15, PUSH0, XTUCK

This contract causes an error accessing the VM’s stack:

panic: runtime error: slice bounds out of range [recovered]
    panic: runtime error: slice bounds out of range

goroutine 5 [running]:
testing.tRunner.func1(0xc0000aa400)
    /usr/local/go/src/testing/testing.go:792 +0x387
panic(0x8302e0, 0xc06cf0)
    /usr/local/go/src/runtime/panic.go:513 +0x1b9
github.com/ontio/ontology/vm/neovm.(*RandomAccessStack).Insert(...)
    .../ontology/vm/neovm/stack.go:51
github.com/ontio/ontology/vm/neovm.opXTuck(0xc0000c0b60, 0x0, 0x0, 0x851380)
    .../ontology/vm/neovm/func_stack.go:62 +0x2d8
github.com/ontio/ontology/vm/neovm.(*ExecutionEngine).ExecuteOp(0xc0000c0b60, 0x859500, 0x89ec5a, 0x5)
    .../ontology/vm/neovm/execution_engine.go:119 +0x52
github.com/ontio/ontology/vm/neovm.(*ExecutionEngine).StepInto(0xc0000c0b60, 0x1, 0x1)
    .../ontology/vm/neovm/execution_engine.go:96 +0x2b
github.com/ontio/ontology/smartcontract/service/neovm.(*NeoVmService).Invoke(0xc0000aebe0, 0xc000022748, 0x3, 0x8, 0x9103a0)
    .../ontology/smartcontract/service/neovm/neovm_service.go:243 +0x798
command-line-arguments.TestCrashes(0xc0000aa400)
    .../ontology/smartcontract/test/panic_test.go:85 +0x184
testing.tRunner(0xc0000aa400, 0x8bb550)
    /usr/local/go/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
    /usr/local/go/src/testing/testing.go:878 +0x353
FAIL    command-line-arguments  0.007s

This failure seems like a more straightforward lack of bounds-checking. The XTUCK opcode calls insert on a 1-element stack (stored in r.e), with argument index 1:

    45:     l := len(r.e)
    46:     if index > l {
    47:         return
    48:     }
    49:     index = l - index
    50:     r.e = append(r.e, r.e[l-1])
    51:     copy(r.e[index+1:l], r.e[index:])
    52:     r.e[index] = t

(Original code: https://github.com/ontio/ontology/blob/8f56616ab7c5cdd2b43701835448aaedb8557366/vm/neovm/stack.go#L41)

After line 50, the stack is of size two; however, the first access in line 51 is to r.e[2:1], which provokes an out-of-bound error because the start index is larger than the end index.

It may be that the bounds check on line 46 should be >=.

Analysis

These bugs have the potential to be serious, if the panic condition is not caught by the code that calls Invoke(). Spot-checking of several locations in the Ontology implementation where this occurs suggested it was not done consistently, but I am unfamiliar with the overall structure of the code. Both may well be harmless.

We were not able to understand the intended use of the XTUCK operation, or the reason why the double-INVERT caused a problem in some cases but not others. Fortunately our limited understanding does not stand in the way of finding crashes, when using an automated testing tool.

An individual run of the randomized unit test explores about 90 cases. Finding these two bugs took over 50,000,000 executions, though one was identified practically right away. Worse, the unit test exercises 1-token strings quite aggressively, but there are only 256 of them and none of them generate a failure by themselves.

On the other hand, exhaustive search of the space of 9-byte programs would require 256^9 executions, an unimaginably large number. The fuzzer is able to focus its search on regions of this space which look promising. Unit test code which performs random testing can, and should be, converted to use fuzzing instead to efficiently get greater coverage and ensure that the cost of writing the test is not wasted through ineffective use.

pawel-czerwinski-758007-unsplash.jpg

(Photo by Paweł Czerwiński on Unsplash)

Ontology does not provide a security contact email, but it does participate in a bounty program run by Slowmist. (Unfortunately, their submission form will not accept submissions without an ETH address for awards.) We submitted a bug bounty claim on December 15th but received no update after the promised 10 working days. A followup email to [email protected] on January 1st went unanswered as of January 7th.

Fuzz.ai

Fuzz.ai is an early-stage startup dedicated to making software correctness tools easier to use. Fuzzers, model checkers, and property-based testing can make software more robust, expose security vulnerabilities, and speed development.

Sort:  

Thank you so much for sharing this amazing post with us!

Have you heard about Partiko? It’s a really convenient mobile app for Steem! With Partiko, you can easily see what’s going on in the Steem community, make posts and comments (no beneficiary cut forever!), and always stayed connected with your followers via push notification!

Partiko also rewards you with Partiko Points (3000 Partiko Point bonus when you first use it!), and Partiko Points can be converted into Steem tokens. You can earn Partiko Points easily by making posts and comments using Partiko.

We also noticed that your Steem Power is low. We will be very happy to delegate 15 Steem Power to you once you have made a post using Partiko! With more Steem Power, you can make more posts and comments, and earn more rewards!

If that all sounds interesting, you can:

Thank you so much for reading this message!