Software Testing

The first Ariane 5 launch took place on June 4 1996. Due to a malfunction in the ELV’s guidance system control software, the spacecraft was destroyed by its automated self-destruct system just thirty-seven seconds into the mission. This spectacular incident was one of the most high-profile and most expensive software failures in history (although not the first, and by no means the last major project failure).

The inertial guidance system software from the Ariane 4 ELV had been adapted for re-use in Ariane 5, despite the fact that the flight trajectory of the two vehicles was significantly different. Although the updated software was re-designed to take account of the new flight parameters, critical stages of the pre-flight simulation and testing were never undertaken (mainly as a result of a desire to cut costs, and because the additional testing was thought be unnecessary).

As a result, a critical software error was overlooked, and soon after take-off the booster nozzle deflectors received incorrect data from the guidance control system, sending the spacecraft along a flight path for which the aerodynamic loading would tear it apart. The failure condition was detected, the auto destruct sequence was triggered, and the aircraft was destroyed. For a full report of the subsequent investigation, see:

Araine 5 Flight 501 self-destructs

Araine 5 Flight 501 self-destructs

Software development is probably one of the most complex of human endeavours. Software projects often do not go as planned, with approximately eighty percent failing to meet completion deadlines and costing more than expected, and less than fifty percent delivering all of the specified functionality. The missing features are often promised for delivery in a later version in order to placate the customer.

High profile disasters like the one described above (which cost several hundred million dollars even by conservative estimates) have served to bring the risks associated with large and complex software projects sharply into focus. Software development projects vary in size, complexity, and the nature of the system they are intended to implement.

A relatively small-scale application that has been designed and implemented by a single programmer is obviously going to require less time and effort to test than a huge corporate information system that requires the services of a large team of programmers and analysts over several months or even years. Generally speaking, therefore, the scale of the system to be tested will determine how much effort has to be expended on planning and implementing the test schedule.

The ability to ensure quality requires not only a rigorous approach to design and implementation, but an equally rigorous approach to testing. The most important aspect from the standpoint of quality is that the software should perform exactly as expected, and the only way to ensure that this is the case is to adequately test the software.

Testing should, in fact, be planned before any code is written, and the programmer should write their code with testing in mind. Performance testing is concerned with the responsiveness of the system, which in turn depends on the efficiency of the underlying code. It also depends, however, on the environment in which the system is running, and the number of users accessing the system during any given time period. A system might work fine for a single user, but how does it perform for twenty users, or a hundred users?

The nature of the test environment will depend on the scale of the application and the environment in which it will be used. An information system for a large corporation may warrant the use of a number of dedicated test machines, allowing tests to be carried out under a range of conditions, and for a variety of hardware configurations. As far as possible, the configuration of test hardware (and the operating system that is used) should accurately reflect the configuration that will be used in the target environment.

What is software testing?

Software testing is the process of running a program to find errors, and to determine whether or not it is fit for its intended purpose. A software process is similar to a physical process in that inputs are received and outputs produced. The difference is in the many (sometimes bizarre) ways in which software can fail.

Detecting all of the possible ways in which software can fail is often impossible. Unlike physical systems, bugs in software systems are purely down to design errors. They are not caused by defects in materials, nor are they introduced during the manufacturing process. Bugs occur in all software projects of any size, not because the programmers are incompetent, but because of the sheer complexity of such a project.

Because software logic can exhibit such complexity, testing boundary values alone is not sufficient to prove that a program will behave as expected in all circumstances, and far more comprehensive test data must be provided. On the other hand, it is not feasible to test a program of any size under all possible conditions that it might encounter. Even exhaustively testing a simple program to add together two 32-bit integers would take a very long time

Realistically, the test data should include typical values that will require all possible branches of execution to occur at least once. It should include a range of atypical (but valid) data that will test the program's ability to deal with unusual circumstances, and erroneous, incomplete or missing data that will trigger the program's error routines.

In more complex projects, testing may result in changes to the coding of a program to cure an existing problem, which can inadvertently introduce new problems. In this scenario, the only way to be sure of the program's correctness is to begin testing again from scratch. The level of complexity that can be managed in terms of software production is often limited by the effectiveness of the techniques that are used to test and debug programs.

The results of software errors can in some cases be disastrous, as we have seen. Bugs in critical software systems have caused aircraft to crash, and halted trading on the stock market. The reliability of software can literally be a matter of life and death, so it must perform as expected under the conditions specified.

The main purpose of testing is to find and fix problems before the software is deployed. Depending on the nature of the software project, testing may also be used to determine other factors, such as the usability of the software, how well it is documented, and how readily it can adapt to future requirements.

A general definition of dependable software is that it engenders a high probability of fault-free operation, and does not fail in unexpected or catastrophic ways. Simply performing a large number of tests does not guarantee the reliability of a program. It simply proves that the software works correctly under the test conditions. On the other hand, just one failed test is enough to prove that the software does not work correctly.

Given that a negative test is one that is intended to break the software, the goal is to produce a program that has sufficient exception handling capabilities to survive a reasonable number of negative tests. Because testing software often requires significant time and effort, the testability of a piece of software is an important consideration in the design stage of a project.

Black box testing

Black box testing (sometimes called functional testing) is a software testing technique in which the tester has no knowledge of the internal workings of the software system or subsystem under test. The tester is aware only of the input data to be used, and what the expected outcomes should be. He or she never sees the programming code, and does not even need to understand how the program works in any detail.

This type of testing precludes the possibility of bias affecting the tester's evaluation of the software, since they are not involved in the software design process. It also means that testing is carried out very much from a user's perspective, and the tester is not required to be familiar with a particular programming language (or any program language at all, for that matter). On the down side, although test cases can be designed as soon as the software specification is complete, they can be difficult to design.

The time available for black box testing is usually limited, and it is not generally possible to test every possible path through the program. Black box testing can be applied to all levels of testing from unit testing right up to system testing. It is most frequently used for the higher levels of testing, however. As the size of the "box" becomes bigger, the complexity of the internal mechanisms increases rapidly, making black box testing the only practical option.

White box testing

White box testing (also referred to as glass box testing) is a software testing technique in which the test data is carefully selected in order to test, as fully as possible, every possible execution path through the software unit (or subsystem) under test. A detailed knowledge of the algorithms used to implement the unit is essential in order to ensure that the tester understands the program logic involved, and can ascertain that every possible scenario is examined. The tester must be thoroughly familiar with the relevant programming language, and be able to determine whether or not the test results match those required by the specification.

This type of testing relies quite heavily on the availability of adequate documentation, and the use of clearly-written and well commented program code. It is also a fairly resource-intensive process, and is most frequently used for low level testing (typically unit and integration testing). With this type of testing, all of the execution paths through the unit (or between units if integration testing is being undertaken) will be tested. The test cases are typically designed to carry out control flow testing, data flow testing, and branch testing.

Unit testing

Unit testing is a test-as-you-go approach in which each significant block of code (i.e. a subroutine, function, event-handler or method) is thoroughly tested before writing any more code that relies on it. Typically, the test will involve single-stepping through the code line by line to determine whether or not it does what it is supposed to do. A unit is the smallest piece of code that can be tested, and should not rely on other code units unless they too have been thoroughly tested (although it is reasonably safe to assume that the library functions provided by an established programming language have already been extensively tested). The aim is to determine whether or not the code has been correctly implemented.

When carrying out unit testing, it is sometimes necessary to comment out statements to prevent the execution of external code – you just need to remember to uncomment the affected statements once testing is complete. Often, a program component consists of several units of code that work together to carry out a specific task. The component may still be considered small enough to qualify as a unit for testing purposes, but care should be taken to ensure that each line of code is executed at least once during testing, although not necessarily in the same test. For example, if a function includes an If..Then..Else statement, a test scenario should be devised for each path of execution.

Integration and system testing

The aim of integration testing is to ensure that interacting subsystems interface correctly with one another, and that the introduction of one or more subsystems into the system does not have an adverse effect on existing functionality. An integration test covers the testing of interface points between subsystems, and is performed only after unit testing has been completed for all of the units that comprise the subsystem being tested.

System testing is carried out after integration testing is complete, and is concerned with the entire application. The emphasis is on ensuring that all parts of the system work correctly, and in accordance with the customer requirements laid down in the specifications. The testing conducted could include correct program configuration and initialisation of program variables, error handling and recovery, correct behaviour of the user interface, and the availability of help screens.

A system test should be conducted once the application is in the form in which the end user will see it (i.e. minus any test code or debugging messages), and should be as complete as possible. Testing should also include the installation of the program. After all, if you are developing commercial software, the setup program is the first thing the user sees.

Acceptance testing

Acceptance testing (sometimes called performance testing) occurs when a tested version of the application is tested by a number of users who have been trained in the correct use of the application. The users are expected to identify appropriate test data and enter it into the system themselves. The data used should be real data, and whatever processing the system is expected to perform in actual use should be tested, and the results carefully checked.

The performance of critical parts of the system (i.e. speed of operation, user response times, fault tolerance and the ability to handle the expected volume of data) is also examined. Once users have a reasonable degree of confidence in the system, the project can be signed off and the system rolled out. If the system is a large and complex one, it may be better to carry out acceptance testing and roll-out in a number of stages to minimise the operational impact on users and support staff.

Regression testing and code reviews

Regression testing is the re-testing of program elements that have already been tested once, following changes to the source code. The purpose is to make sure that these components still work as specified, and that the changes made have not introduced new "bugs" to the program.

A code review is a process in which the program's code is evaluated with respect to its efficiency (are algorithms, for example, written as efficiently as possible?), and to ensure that it meets any applicable in-house coding standards. This will hopefully result in fully optimised code, and reduce the incidence of bad coding practice. File handling procedures that do not clean up after themselves by closing the affected files, for example, should be flagged up and corrected. The inclusion of adequate commenting is another factor that should be checked.

Test schedules

The test schedule for a software development project will be the result of a planning process in which the overall test requirements are determined and broken down into phases. The usual approach is to carry out unit testing at an early stage, followed by integration testing, then system testing, and finally acceptance testing. Depending on the size of the project, and the number and size of the subsystems within the scope of the project, testing at different levels may occur at the same time.

High-level planning will normally occur during the early stages of a software development project, although detailed planning of some of the lower-level testing may occur much later. The timing of each phase of testing will obviously depend on the scheduled completion dates for each stage of software development, since testing can obviously only take place once there is something to test.

Unit testing will probably be ongoing throughout the implementation stage of the project, as the various low level software components will be completed at different times. Similarly, integration testing will be undertaken for each group of software components as and when all of the units involved in a specific set of interactions are complete.

The test schedule will identify as accurately as possible the start and end dates for each stage of software testing. The test schedule planning process will take as its inputs the requirements specification, including a full system description and detailed functional specifications. The outputs will be a set of criteria for each phase of software testing, details of the software and hardware environment in which each phase of testing is to be carried out, and a detailed plan for each phase of testing that identifies what testing activities to be carried out, and when.

For each test plan created, suitable test cases and test data must be identified, together with the expected outcomes. If test results are not as expected, the reasons for this should be investigated and the code amended as necessary. Once problems have been corrected, the tests should be run again (including any regression testing felt to be necessary to detect additional errors introduced by the revised program code). Once all tests have been satisfactorily completed, the test results should be recorded and signed off by the project manager or an appropriate supervisor.

Test plans

Test plans focus on how the system will be tested during each phase of testing. It specifies what part of the system is being tested, together with a detailed description of the test cases to be applied. The test data for each test case will be identified, together with the expected outcome of each test. The various test phases are listed below, together with a description of the type of activity and test data that should be included in the test plans devised for each phase.

Software testing is the subject of an international standard (ISO/IEC/IEEE 29119). The standard comprises five parts, which we have briefly summarised below. The first three parts were published in 2013, with parts four and five being published in 2015 and 2016 respectively.

  1. Test definitions and concepts - introduces the vocabulary on which the standard is based and provides examples of its application; this part of the standard essentially establishes the terms of reference for applying the remaining parts of the standard.
  2. Test processes - establishes a generic model for software testing processes that can be used by organisations carrying out software testing at various organisational levels, and with different software development lifecycle models.
  3. Test documentation - includes templates and examples of the kind of test documentation that should be produced during the testing process.
  4. Test techniques - provides standard definitions of software test design techniques for use in test design and implementation.
  5. Keyword-driven testing - an approach to specifying software tests geared towards the creation of automated testing based on keywords.

Further details of the standard can be found here.