The Psychology of Quality and More
CHAPTER 7 : File Layout
7.6 Layout of Code files
Code files (i.e. those which include function definitions) may include all types of item. The primary item, however, is the function, and other items exist to support them.
A standard layout makes things easier to find in the file. A typical layout, might be:
File header comment
File identification string
External data declarations
External function declarations
Public data definition
Public function declarations
Private replacement items
Private data definition
Private function declarations
Public function definitions
Private function definitions (sorted alphabetically)
This uses the principles of external before internal, and public before private (see 7.2.2). Function declarations are put after data and other declarations as these may form the context for the function, for example where prototypes are being used.
7.6.1 Include files
Inclusion of other header files in code files is a convenient way of simplifying the file contents and of re-using information without having to retype it in each file where it is needed.
It is normal to group the #include statements near the start of the file, typically immediately after the file header comment, as the included items may be needed anywhere below.
The #include statements may be sorted by the level of the included files (see 7.4.1):
/* system header files */
Note that the angle brackets '<...>' are used for system files, to force the preprocessor to use the available system search path to find these files, and that quote marks are used for local system headers. This is another way of highlighting the differences between system and program header files.
Problems can occur if the program used a complex directory system, where header files include full pathnames of files:
This makes the directory structure difficult to change without changing an unknown numbers of source files too. An alternative is to avoid absolute pathnames, but to allow pathnames which are relative to a base compilation point:
This still forces a partial directory structure. The third alternative is to set the system search and program builder paths to check all relevant directories, and not to include directory paths in any file references:
This is the most portable method, although the build system must be able to find the correct header file (typically by using the compiler's -I flag).
7.6.2 External items
Declarations for functions and data which are defined outside the current file (extern's plus any associated #define's, etc.) may be made in functions where they are used, or together at the top of the file (see 7.3.1). In either case, maintenance is difficult, as changing the original definition of the item will also require changes to all of the external references to it in this file.
An alternative, which simplifies this problem, is to put the external reference in a #include'd header file. The header file is closely associated with the file where the items it contains are defined, thus enabling easier maintenance. This method, however, is not without its disadvantages, as the immediate explicitness which local declaration gives is lost. The header file is also likely to contain other items, probably from the same functional area as the external item, whose scope is now extended into the including file.
7.6.3 Replacement items
Internally used typedef's, #define's, enum's, struct's and union's may be used for different purposes, in which case, their positioning in the file may change.
Where they are to be used in data declarations, then they must be declared beforehand. Declaring them close to the data definition is using functional grouping, which makes their intended usage clear:
#define MTHS_IN_YEAR 12
Where they are for use in more than one function, then the obvious place to declare them is at the top of the file.
Where they are used in one function only then they can be declared in that function, thus minimizing scope. Note that the scope of #define's is from their point of declaration to the end of the file. This 'leakage' can be contained by using #undef's at the end of the function, although this appears clumsy, and makes it more difficult to maintain.
7.6.4 Private/Public data items
A data item which is used by more than one function in the code could be declared at a point between functions, just before its first usage, thus minimizing its visual scope. This would, however would make it difficult to find when reading other functions which access it.
The simple approach for data items which must be defined outside functions is to always put them at the top of the file, where they can most easily be found.
7.6.5 Private function declarations
The scope of a function declaration is from the point where it is declared to the end of the file. If it is called without being declared beforehand, the function is declared implicitly, which not all compilers can handle. It is being explicit and portable to require that all functions are declared before they are called. This ensures that the return type is checked, and, if ANSI prototypes are used, that the type of arguments used are checked also.
Functions may be declared at the start of all functions that call them. However, this adds to the complexity of the function (see 7.3.1). Putting them near the start of the file forms an effective 'contents' list of the file, especially if it is sorted into the same order as the function definitions below it.
A further option is to put all function declarations into a #include file. This reduces the size of the file, but can make it less manageable, as a change in the source code file may require a corresponding change in the header file. It also is an unnecessary use of a #include file, as its contents are only used in one file.
Note that function prototypes are one of the biggest differences between 'original' C and ANSI C; the styles should not, of course, be mixed.
Functions at this level of discussion are treated as atomic items (see chapter 4 for layout inside functions). Several sorting schemes could be used to decide which function to put where (see 7.2.2). A simple scheme is to use a two-level ordering:
The size of a function is constrained by the function that it performs, the overall complexity and the visual amount that the brain can absorb at one time. This is physically constrained by the screen and page size: turning a page has a serious effect on short term memory, which cannot now be refreshed by a quick flick of the eyes. Ideally, the function will fit within one page (typically about 60 lines). Practically, this is often too limiting, especially if a header comment of any size is used. A more practical limit is two pages, or around 100 lines. It is also worth noting that if we use the rule of seven (plus or minus two), to give n chunks of n lines, this gives us a limit of between 25 and 100 lines.
When functions spread beyond the defined maximum number of lines, a logical split should be found. Sometimes this is not possible, for example with long switch statements. Sometimes it is not logical. But in the majority of cases, C functions will fit comfortably within this limit.
And the big