Posted on by Achim D. Brucker, licensed under CC BY-ND 4.0.

Dijkstra's Aircraft: Testing vs. Verification‽

The there is a long and still ongoing battle between the verification community and the testing community about the right approach to showing the correctness of computer programs. Often, one side brings up the famous quote of Edsger W. Dijkstra: “Program testing can be used to show the presence of bugs, but never to show their absence!”

This quote is often used to manifest that verification is the holy grail of program correctness and testing is necessary evil as a full verification is often too expensive (even though, there are successful verification of, e.g., complete operating system kernels). But is this true?

Let’s due a small gedankenexperiment.

Which aircraft would you take for your next flight?

Let’s assume we have to brand new aircrafts that are ready for their first take of with commercial passengers, including **you*.

Aircraft A Aircraft B
Blue airplane Red airplane
  • Aircraft A is follows Dikstra’s approach, i.e., its correctness is formally verified, but it never completed a test flight.
  • Aircraft B is thoroughly tested and has completed over 1000 test flights successfully.

Which aircraft would you prefer for your next flight? Which one would Dijkstra? Looks like thorough testing can, indeed, provide some level of confidence of the absence of bugs. Does this mean we should give up the idea of a fully verified system? Of course not.

Testing and verification

Testing and verification have different properties:1

  • Verification, i.e., the mathematical proof that a computer program (or system) is correct (or secure) can show the the absence of certain bugs with respect to a formal semantics. This is important and recent advances in formal methods enable us to formally verify large systems.
  • Testing, i.e., the executing a computer program in a real environment (i.e., on real hardware, a real operating system, interacting with real back-end-systems) and validating its behavior allows us gain confidence in the correctness (or security) of the complete stack.

Thus, to show the correctness and security of computer programs, neither only testing nor only verification is the answer. The correct approach is testing and verification: use verification to show the correctness and security (and, hence, the absence of certain bugs) of the most critical (high risk) components and use testing to show that the overall system is correct (and secure).

This two-staged approach is necessary, as verification provides a higher level of trust into the correctness of a system but usually is based on a formal specification that makes certain assumptions that the environment (e.g, the real hardware) needs to fulfill). Testing can validate that those assumptions are actually met by the implementation (i.e., the real system).

Theorem prover-based testing: the best of both worlds

Combining testing and verification is particular powerful, if the formal specification serves two purposes: first, it forms the basis for a formal verification and, second, it forms the basis for generating test cases that then can be used for testing the real implementation. To realize this vision, we need a uniform formal verification and test-case generation system, i.e., a system that allows us to formally proof properties of a formal system model and generate test cases from the same (or a formally equivalent) specification. If you want to know more about this approach, you might want to check out our theorem prover-based testing system HOL-TestGen, which combines the formal verification powers of HOL-TestGen with symbolic test case generation.

  1. Of course, if we could test all possible inputs of a computer program, i.e., executing a complete test, the level of confidence would be equivalent to a formal proof. Due to the usual large number of possible inputs, such a thorough test is, usually, not feasible.