How we change what others think, feel, believe and do
CHAPTER 7 : File Layout
7.3 Considerations for File Layout
Layout within a file requires the consideration of the positioning of a number of basic items, including comments, preprocessor commands, data structure and type descriptions and declarations and definitions for functions and data.
Before considering the specifics of file layout, we can consider some basic approaches for organizing files:
There are several alternative collation methods that may be considered when deciding how to order items within a source code file:
Context then definition
Before an item is defined, it often is necessary to set up a context which describes all items required for use by the definition:
#define SCRN_WIDTH 80 /* this is the
The context for such a data definition is simple; the context for a function may be significantly more complex. A context may include:
There are two basic approaches for laying out context and definition:
(a) Put the context close to the definition
This includes putting typedef's near declarations, and external function declarations in function definitions in order to minimize scope and make the defined item self-contained. This improves the visual, if not always the logical scope. This also helps when files are to be divided.
However, the context cannot always be completely contained so easily, particularly where the context is shared with other defined items - repeating context items at each definition can cause redeclaration problems, and can add noticeably to the number of lines in functions and files, increasing visual complexity.
(b) Put the context for all definitions at the start of the file
This philosophy places less emphasis on the context, regarding it more as 'noise' which is not that important for understanding the definition. It allows the context for all items to be read first, in one go, helping the reader to understand the constraints on the later definitions. Any questions of context may be simply answered by looking to the start of the file. In a functionally cohesive file, the context should also be cohesive, and should fit naturally together.
However, it is less clear which context items are used by each function and if the file is to be divided in future, which is quite possible with an evolving system, it is not so easy to split it cleanly.
In practice, a balance between these philosophies may be used, putting common context items earlier, and effectively private items near their usage. Note how similar this is to the philosophy of hierarchically dividing files and functions, where common 'library' functions are grouped separately from the private functions.
External then Internal
The reader of the file is usually more interested in the 'meat' in this file, thus declarations for external items can be considered as 'noise' which are best put together somewhere where they can be checked to see if a given item is external. Internal items often make use of external items, thus the external items should be declared earlier, rather than later.
Public then Private
For items that are defined in this file, the reader is more likely to be interested in those which are used externally to the file. Thus it makes sense to group public and private items separately and to put the public items nearer the start of the file.
Items which are of a similar functional group sit logically together. For example, functions to open and close a file, or data definitions and their possible values. However, when there are a large number of small groups, this does not help the reader find a specific item. Thus, if the 'OpenFile()' function is in the middle of a small group of file-operation functions, which are in the middle of a group of database functions, then that function can be quite difficult for the reader to find!
Items can be grouped by type, thus all int's may be put together, etc. This may be a convenient form of logical grouping, but where the variables are not part of a close-knit group, then it can also be confusing.
When there are a large number of items which have no strong reason for grouping in any other way, sorting them alphabetically by name puts them into an order in which it is easy at least to find them.
This can be confusing if this technique is not understood, or if the reader is expecting some other grouping (e.g. functional). Thus an 'OpenWindow()' function may end up next to a 'OutputComment()' function for no apparent (to the reader) reason.
7.3.2 File header comment
Putting the file header comment as the very first item in the file is a good idea, as this will be the first item that the reader will meet, and it will guarantee that the maximum part of it will be on the first page of a printout of the file.
The file header comment introduces the file to the reader, helping him to understand its purpose and its place in the structure of the entire program. Its contents are discussed in detail in chapter 4.
7.3.3 Use of page breaks
It is very useful to be able to see all of (or at least most of) the item that is being viewed on one page of a printout. This can be helped by using page breaks can be used at the start of each major function or section. It also makes those major items easier to find.
Unfortunately, there is no standard way of forcing a form feed in C, and methods such as inserting an ASCII 'FF' (Form Feed, usually available with Ctrl-L) can be non-portable, although including this inside a comment can help:
/* <Ctrl-L> This is now on a new page */
A good printing utility will enable page breaks and page headings to be set by using keywords in comment blocks.
Page breaks should, as most controls, be used with reasonable sense. Over-use can result in few lines of code per page and very thick listings.
7.3.4 File width
How wide should a 'C' program be? To many people, the answer of '80 columns' is simple and obvious - they have no way of making it more. However, editors with 'sideways scroll' are widely available, and large-screened workstations may be used, enabling lines of code many characters wider to be used. Also, many printers can print 132 (or more) characters per line.
However, viewing long code lines on a system which cannot display more than 80 characters can result in the end of the line being 'lost' or 'wrapped'; either way, the readability quotient of the code plummets. Even if a sideways-scrollable editor is used, it easy to not spot that the end of the line is off the edge of the screen, and consequently to misunderstand the code.
The simple solution is to limit code width to the minimum that a future reader of the code might be able to use, most commonly 80 columns, although some editors monopolize screen columns for line numbers and continuation characters, in which case less may be better. An objection to restricting columns is that particularly when deeply nested, there is not enough space left in the remaining columns to lay the code out tidily. A response is that deep nesting significantly adds to the complexity of the code, and is a prime candidate for breaking out a sub-function.
7.3.5 Tab settings
Deciding on standard tab settings is another consideration of layout which may affect the portability of the layout.
Many editors and terminals on which C is used (typically MS-DOS and Unix) have a default tab-setting of 8 characters across the screen. It may be convenient to change the tab-settings (eg. to 4 characters), but doing so may confuse someone at a later date who has not (or can not) set a similar tab-setting change, whence the 80 character screen width will be lost. Note that the print system must also know how many characters there are in one tab stop.
If you have a clever editor (e.g. 'emacs'), you can program this to substitute the appropriate numbers of tabs and spaces to make it appear as if the actual tab settings have changed (when, of course, they have not).
7.3.6 Code identification
It is useful to be able to uniquely identify both the source code and the object code. This may be done by putting identifying data definitions in the code, typically at the start of each file. This may take the form of a file identifier, plus a copyright statement:
/* getname.c */
This practice can be very helpful during diagnosis and debugging as well as in litigation!
And the big