notes on page makeup using latex

introduction

LaTeX automates many aspects of book production (numbering, cross-referencing, construction of table of contents, indexing), but two aspects of page makeup require manual intervention: line breaks and page breaks (under the default parameter values in LaTeX, for normal size pages). If you are typesetting material to be published, you need to understand how to handle these two items.

Users of TeX and LaTeX have to deal seriously with page makeup only rarely, unless they regularly write books, and are generally unfamiliar with the techniques available. These notes contain no secrets: all the points are discussed in The TeXbook (Donald Knuth, 1984), LaTeX: A Document Preparation System (Leslie Lamport, 1986), and The LaTeX Companion (Michel Goosens, Frank Mittelbach, and Alexander Samarin, 1994). My only contribution is to collect the points, which are scattered among the three books, in a single document. (I am not aware of another document that does the same job.)

line breaks

No doubt you have encountered TeX's warning "Overfull \hbox ..." and have noticed lines sticking out into the right margin of your document. TeX formats a paragraph at a time, choosing the line breaks to make the paragraph look as good as possible, taking into account, for example, how much the words on each line are spread out or squeezed together, how much the tightness changes between consecutive lines, and how many consecutive lines end in hyphens. It does so by associating a "badness" with each collection of line breaks, and choosing the collection that minimize this badness. (See pages 27–30 and Chapter 14 of The TeXbook for the fascinating details.) When the badness of the best collection of line breaks that TeX can find exceeds a user-defined limit, TeX gives up and allows some lines to flow into the right margin, writing warnings to the log file. (TeX allows you to print a black rectangle at the end of such lines, so that you can spot the lines more easily (overflows of less than a point are hard to see). However, the default LaTeX parameters do not print such a rectangle.)

You need to eliminate all overfull \hboxes in your final version. Here are five techniques you can use.

  1. You can tell TeX to relax its standards for the paragraph, by putting \sloppy before the first word of the paragraph. You'll want also to put \fussy before the first word of the next paragraph, to return to high standards. The \sloppy declaration tells TeX (a) to allow very (though not arbitrarily) bad line spacing, (b) to ignore lines overfull by less than half a point (less than two-tenths of a millimeter), and (c) to add an extra 3 em of stretchability to each line. (An em is a font-dependent unit of horizontal distance, roughly equal to the width of the letter "m".) A paragraph set under the \sloppy declaration is unlikely to produce an overfull \hbox, but it may produce an underfull one—a line that has large spaces between the words. If so, you'll probably want to solve the problem using one of the other techniques. Even if you don't get an underfull \hbox, you should check the appearance of the paragraph to see if it is consistent with the rest of the page; you may decide that you need to work on it more.
  2. TeX determines where words can be hyphenated, but doesn't find all possible hyphenation points, especially in proper names. Adding a discretionary hyphen (type \-) in a word may allow the paragraph to be set without any overfull \hbox. (Because TeX analyzes an entire paragraph when choosing line-breaks, the hyphen that solves your problem may be needed not in the overfull line, but in some other line in the paragraph.)
  3. If you have followed Knuth's advice on pages 91–93 of The TeXbook, you have included many ties in your text. (If you haven't followed his advice, you will have many bad line breaks.) You may be able to solve your problem by eliminating one of them.
  4. If none of the previous three techniques solve your problem, you have few reasonable options left. One that I use in this case for \hboxes that are overfull only slightly (by 1 or 2 points) is simply to tighten up the line in question by adding explicit \kerns. Say my line is overfull by 1 point, and there are 11 spaces on the line, including one after a period. Then I divide 1 point by 12, to get 0.083 points, put \kern-0.166pt after the space following the period, and \kern-0.083pt after every other space. (If you ever edit the paragraph subsequently, you'll of course want to remove the \kerns.) I regard the use of this technique as "cheating", but I also find the result quite acceptable.
  5. If all else fails ... edit the paragraph. You may regard the use of this "technique" as an unacceptable concession that form trumps content. While I am somewhat sympathetic to this view, here are two defenses. First, not paying attention to the form (for example, leaving the line as it is, extending into the right margin, or allowing a wildly-spaced line) is distracting; it makes the text hard to read, just as does poor writing. Second, a significant fraction of the sentences you write can surely be improved, so that editing a paragraph may well mean making it clearer or more graceful.

(Incidentally, while I have not investigated the matter carefully, I believe that LaTeX does not specify correctly the stretchability of the item labels in description environments, greatly increasing the chance of an overfull \hbox in the first line of a description \item. The problem is especially serious if the item label is almost one line long.)

page breaks

Your publisher probably wants you to adhere to rules along the following lines.

  1. All lines, including those between paragraphs, must be exactly the same distance apart. (This rule will probably be expressed by saying that the font size should be something like "10 on 13", which means that the font size should be 10pt and the \baselineskip 13pt.)
  2. Most pages should have exactly the same length, say x.
  3. Every pair of facing pages (even–odd) must have the same length.
  4. Some pairs of pages can be longer or shorter than x by up to one line.
  5. No consecutive pairs of pages can differ in length by more than one line.
  6. The last line on any page must not be the first line of a paragraph, and the first line on any page must not be the last line of a paragraph. (In typesetting jargon, your pages may contain no orphans or widows. Knuth calls a widow a "club line".) A single line at the top of a page immediately followed by a displayed equation is probably permissible.

If any of your pages consist only of text unbroken by itemizations, theorems, displayed equations, or other items that allow for vertical stretch, the normal page length x had better be a multiple of the \baselineskip, of course, otherwise rules 1 and 2 are incompatible.

TeX contains parameters that help you satisfy rule 6: \clubpenalty and \widowpenalty. LaTeX (and Plain TeX) sets these fairly small, definitely not large enough to rule out orphans and widows in normal book pages. If your publisher prohibits orphans and widows, you may want to increase them; you could try \clubpenalty=300 and \widowpenalty=300.

Rule 4 helps you satisfy rule 6, but it is easy to construct an example in which all six rules cannot be satisfied. To do so, write several pages of text, unbroken by any section headings or other items with vertical stretch. Specify x to be an odd number of lines. Suppose that the bottom line of the first page—assume it is an even page—is an orphan, and the top of the second page is a widow. You can eliminate both the orphan and the widow by making the first page one line longer or one line shorter than normal. (The next paragraph describes how to do so.) But whichever route you take, you may now have an orphan at the bottom of the next page, or a widow at the top of the following page. So to satisfy all six rules you need to employ some tricks.

To make a page one line longer than normal, put \enlargethispage{\baselineskip} on the page. (The LaTeX Companion says, on page 100, that this command has to go between paragraphs. I haven't verified that this claim is correct. If it is and your page contains no paragraph break, you're in trouble.) Similarly, \enlargethispage{-\baselineskip} causes a page to be one line short, and the argument of \enlargethispage can be any other vertical distance. (You can define macros \def\longbookpage{\enlargethispage{\baselineskip}} and \def\shortbookpage{\enlargethispage{-\baselineskip}} to save some typing.) Rule 3 requires that the length of every odd page be the same as the length of every preceding even page, so you will put pairs of the command \enlargethispage in your book. (Almost certainly a macro \enlargethispageandthenext could be written, but as far as I know none exists.)

Unless you're very lucky, the technique of changing the length of pages by up to one line will not allow you to satisfy your publisher's affection for uniform pages free of widows and orphans. To eliminate the remaining bad breaks you'll need to change the number of lines on some pages by using tricks like the following.

  1. TeX presents you with the best set of line breaks for each paragraph. For any given paragraph, many other sets of line breaks may be acceptable; some of these alternative sets may make the paragraph longer or shorter than does the optimal set. You can force TeX to search among sets of line breaks that make the paragraph n lines longer than does the optimal set by declaring \looseness=n before the paragraph. (Cancel this declaration for the following paragraph by specifying \looseness=0.) If n > 0, then under this setting, TeX presents you with a set of line breaks for which the paragraph has length k for some kn, no other set of breaks has lower badness among all those that generate a paragraph of length k, and no acceptable set of breaks produces a paragraph of length greater than k. If n < 0, then symmetrically TeX finds the best set of breaks that produce a paragraph as close as possible n lines shorter than normal.

    In my experience this technique is of limited use. The only paragraphs of mine that I'm able to lengthen are those in which the last line naturally extends almost to the right margin, and I'm able to lengthen them by at most one line. Similarly, the only ones I'm able to shorten are those in which the last line consists of a single word. Perhaps your paragraphs are more malleable than mine; they will be if they are long and full of short words.

  2. TeX looks ahead when choosing line breaks—it analyzes a paragraph at a time—but doesn't do so when choosing page breaks. I'd like it to analyze a chapter at a time, but it proceeds myopically. (Presumably the reason for this limitation lies at least in part with the limited power of computers at the time TeX was developed.) This myopia means that sometimes you need to force a page break. Consider the following situation. On the top of page 10 you have a large unbreakable block—for example, a section head followed immediately by a subsection head—and however much (within your limits) you lengthen the preceding pages, you cannot fit this block at the bottom of page 9. At the same time, page 9 is not full—there is not enough stretch available on it to fill it up, but page 8 has a lot of stretch. When choosing the break between pages 8 and 9, TeX does not take into account that you'd like to move lines to page 9 to fill it up, even though there is enough stretch to do so. (Rather, it chooses the page 8/9 break simply to make page 8 look as good as possible.) If there is enough stretch available on page 8 to move a line to page 9, then you can put an explicit \pagebreak on the last line you want on page 8. This command does not terminate the page immediately, but rather instructs TeX to break the page at the end of the current line (which remains properly justified). Similarly, you can force TeX to squeeze an extra line onto the page (by reducing the space around displayed equations and section heads, for example) by putting \nopagebreak at the appropriate point.
  3. There's not much more you can do to manipulate your page breaks without either explicitly changing the vertical spacing in your text (see point 5) or changing the content of your pages. One change in content is very easy to make: the vertical scaling of your figures. You want similar figures to have the same scaling, but apart from that the scaling you choose is pretty arbitrary within some fairly broad limits. Thus you can change it quite a bit, significantly changing your page breaks with very little change in effective content. If your figures are written using the PSTricks macros, changing the vertical scaling is trivial: you simply declare \psset{yunit=x} where x is the appropriate length. (If your default yunit is 1mm, you can try 1.1mm or 0.9mm, for example. For some figures, of course, you will want to change the xunit if you change the yunit.)
  4. Another change in content can move your page breaks: editorial changes in your text. (My comments in point 5 of my discussion of line breaks apply here.) If you want to make a paragraph longer by one line, you'll want to target those with last lines that almost extend to the right margin, and if you want to make a paragraph shorter by one line, you'll want to target those with a single word on the last line. Note an interesting consequence of TeX's digesting of a paragraph at a time: adding a word to a paragraph may make it shorter (I have seen examples).
  5. In the last resort, you can add explicit vertical space (\vspace{x}), overriding the specifications of your document and thus introducing an inconsistency (which, however, may be very hard to detect). Just don't tell your production editor that I suggested your doing so.