The Psychology of Quality and More
CHAPTER 10 : Programming Usage
10.4 Error handling
Error handling is a specific part of defensive programming that is such a broad topic that it deserves a section of its own. The starting point for error handling is from when the error has been discovered, and addresses the question, "What do I do now?"
The most important thing to standardize on in error handling is that all errors should be handled. "It'll never happen!" is a common excuse for not checking for and handling errors, as is, "I couldn't do much about it anyway." In fact, this is seldom true - it just takes more effort.
It is also important to standardize on the method of handling errors, as a consistent approach will result in more reliable, as well as more maintainable code.
There are two basic ways errors appear. Called functions may return an error, which may be from local program calls or from external system calls. Otherwise the error must be detected from other code conditions (such as variables going out of range) which may stem from expected or unexpected causes.
10.4.1 Trapping errors
When unrecoverable errors occur, there are two tasks that usually must be done: cleaning up any partially completed items and getting out! One way to do this is to use an error variable which is tested in all loops:
while ( (pFreeBlock <> pLastBlock) && (ErrNo == FB_NO_ERROR) )
This results in a single return point, but the loop is complicated by extra nesting and comparisons. An alternative is to immediately pass control back to the function caller:
while ( pFreeBlock <> pLastBlock )
This simplifies the nesting and conditions, but it requires any clean-up code to be executed before each return. If the clean-up code is significant, it will make the main code less readable. This can be simplified by using goto to create a simple exception handler, cleaning up and handling errors at the end of the function:
while ( pFreeBlock <> pLastBlock )
A further technique is to use setjmp and longjmp library functions in conjunction with a program-wide exception handler to enable common errors to be handled in one place.
10.4.2 Error identification
Errors are commonly returned as coded integers. There are assorted schemes for what these mean and errors are variously indicated by negative, zero or positive numbers. Although the inconsistency of libraries cannot be changed, it makes sense to use a standard method of identifying errors within a program.
For simple pass/fail situations, zero, which also indicates 'false', may used to flag failure; a simple enum scheme can be useful for error values (see 9.10.5).
Some system error codes will be found in the global variable errno. It is hazardous to leave them there 'for later', as they may be overwritten. It is better to handle them immediately or to copy them to a private error variable for later use.
In a program, keeping a central store of error numbers/messages enables errors to be uniquely identified. Subsystem errors may be identified by allocated number ranges.
In complex situations, passing back a single error number can be inadequate, particularly if the error has a non-unique number and has been passed back through several functions. A standard error structure, typically 'owned' by the outer calling function, may be used here:
This requires error structures to be passed around everywhere, which, although inconvenient, may be preferable to the alternative of using global variables.
At some stage the error must be handled in some manner other than to pass it back to a higher function. This will be at some consistent point where the higher function can add no value to the handling process. At this point, control may be passed to an error handler function. This simplifies the function in which the error is detected, and maintains functional cohesiveness.
There will be two types of error: those that enable the program to continue, and those that require the program to be terminated.
If the program can continue, which means the integrity of all data must be certain, then a consistent diagnostic error message may be given (see 10.5.1) before continuing.
If the program cannot continue, then it should shut down gracefully, preserving as much user data as possible, and giving as much diagnostic information as possible, to help the cause of the problem to be discovered and fixed.
And the big