The present section is not about PGF or TikZ, but about general guidelines and principles concerning the creation of graphics for scientific presentations, papers, and books.
The guidelines in this section come from different sources. Many of them are just what I would like to claim is “common sense,” some reflect my personal experience (though, hopefully, not my personal preferences), some come from books (the bibliography is still missing, sorry) on graphic design and typography. The most influential source are the brilliant books by Edward Tufte. While I do not agree with everything written in these books, many of Tufte’s arguments are so convincing that I decided to repeat them in the following guidelines.
The first thing you should ask yourself when someone presents a bunch of guidelines is: Should I really follow these guidelines? This is an important questions, because there are good reasons not to follow general guidelines.
Guidelines were almost always setup to address a specific situation. If you are not in this situation, following a guideline can do more harm than good.
When you are aware of a rule and when you decide that breaking the rule has a desirable effect, break the rule.
This guideline is total nonsense. An (arguably) sensible guideline is “parameters must be declared alphabetically” so that parameters are easier to find. Another (arguably) sensible guideline is “parameters must be declared in decreasing order of size” so that less byte-alignment cache misses occur when the stack is accessed. The guideline the company used maximized cache misses and resulted in a more or less random ordering so that programmers constantly had to look up the parameter ordering.
So, before you apply a guideline or choose not to apply it, ask yourself these questions:
When you create a paper with numerous graphics, the time needed to create these graphics becomes an important factor. How much time should you calculate for the creation of graphics?
As a general rule, assume that a graphic will need as much time to create as would a text of the same length. For example, when I write a paper, I need about one hour per page for the first draft. Later, I need between two and four hours per page for revisions. Thus, I expect to need about half an hour for the creation of a first draft of a half page graphic. Later on, I expect another one to two hours before the final graphic is finished.
In many publications, even in good journals, the authors and editors have obviously invested a lot of time on the text, but seem to have spend about five minutes to create all of the graphics. Graphics often seem to have been added as an “afterthought” or look like a screen shot of whatever the authors’s statistical software shows them. As will be argued later on, the graphics that programs like GNUPLOT produce by default are of poor quality.
Creating informative graphics that help the reader and that fit together with the main text is a difficult, lengthy process.
When you write a (scientific) paper, you will most likely follow the following pattern: You have some results/ideas that you would like to report about. The creation of the paper will typically start with compiling a rough outline. Then, the different sections are filled with text to create a first draft. This draft is then revised repeatedly until, often after substantial revision, a final paper results. In a good journal paper there is typically not be a single sentence that has survived unmodified from the first draft.
Creating a graphics follows the same pattern:
Graphics can be placed at different places in a text. Either, they can be inlined, meaning they are somewhere “in the middle of the text” or they can be placed in standalone “figures.” Since printers (the people) like to have their pages “filled,” (both for aesthetic and economic reasons) standalone figures may traditionally be placed on pages in the document far removed from the main text that refers to them. LATEX and TEX tend to encourage this “drifting away” of graphics for technical reasons.
When a graphic is inlined, it will more or less automatically be linked with the main text in the sense that the labels of the graphic will be implicitly explained by the surrounding text. Also, the main text will typically make it clear what the graphic is about and what is shown.
Quite differently, a standalone figure will often be viewed at a time when the main text that this graphic belongs to either has not yet been read or has been read some time ago. For this reason, you should follow the following guidelines when creating standalone figures:
For example, suppose a graphic shows an example of the different stages of a quicksort algorithm. Then the figure’s caption should, at the very least, inform the reader that “The figure shows the different stages of the quicksort algorithm introduced on page xyz.” and not just “Quicksort algorithm.”
The main argument against abbreviations is that “a period is too valuable to waste it on an abbreviation.” The idea is that a period will make the reader assume that the sentence ends after “Fig” and it takes a “conscious backtracking” to realize that the sentence did not end after all.
The argument in favor of abbreviations is that they save space.
Personally, I am not really convinced by either argument. On the one hand, I have not yet seen any hard evidence that abbreviations slow readers down. On the other hand, abbreviating all “Figure” by “Fig.” is most unlikely to save even a single line in most documents.
I avoid abbreviations.
Perhaps the most common “mistake” people do when creating graphics (remember that a “mistake” in design is always just “ignorance”) is to have a mismatch between the way their graphics look and the way their text looks.
It is quite common that authors use several different programs for creating the graphics of a paper. An author might produce some plots using GNUPLOT, a diagram using XFIG, and include an .eps graphic a coauthor contributed using some unknown program. All these graphics will, most likely, use different line widths, different fonts, and have different sizes. In addition, authors often use options like [height=5cm] when including graphics to scale them to some “nice size.”
If the same approach were taken to writing the main text, every section would be written in a different font at a different size. In some sections all theorems would be underlined, in another they would be printed all in uppercase letters, and in another in red. In addition, the margins would be different on each page.
Readers and editors would not tolerate a text if it were written in this fashion, but with graphics they often have to.
To create consistency between graphics and text, stick to the following guidelines:
This means that when generating graphics using an external program, create them “at the right size.”
The “line width” for normal text is the width of the stem of letters like T. For TEX, this is usually 0.4pt. However, some journals will not accept graphics with a normal line width below 0.5pt.
However, graphics may also use a logical intrinsic color coding. For example, no matter what colors you normally use, readers will generally assume, say, that the color green as “positive, go, ok” and red as “alert, warning, action.”
Creating consistency when using different graphic programs is almost impossible. For this reason, you should consider sticking to a single graphic program.
Almost all graphics will contain labels, that is, pieces of text that explain parts of the graphics. When placing labels, stick to the following guidelines:
One of the most frequent kind of graphics, especially in scientific papers, are plots. They come in a large variety, including simple line plots, parametric plots, three dimensional plots, pie charts, and many more.
Unfortunately, plots are notoriously hard to get right. Partly, the default settings of programs like GNUPLOT or Excel are to blame for this since these programs make it very convenient to create bad plots.
The first question you should ask yourself when creating a plot is the following:
If the answer is “not really,” use a table.
A typical situation where a plot is unnecessary is when people present a few numbers in a bar diagram. Here is a real-life example: At the end of a seminar a lecturer asked the participants for feedback. Of the 50 participants, 30 returned the feedback form. According to the feedback, three participants considered the seminar “very good,” nine considered it “good,” ten “ok,” eight “bad,” and no one thought that the seminar was “very bad.”
A simple way of summing up this information is the following table:
Rating given | Participants (out of 50) who gave this rating | Percentage |
“very good” | 3 | 6% |
“good” | 9 | 18% |
“ok” | 10 | 20% |
“bad” | 8 | 16% |
“very bad” | 0 | 0% |
none | 20 | 40% |
What the lecturer did was to visualize the data using a 3D bar diagram. It looked like this:
Both the table and the “plot” have about the same size. If your first thought is “the graphic looks nicer than the table,” try to answer the following questions based on the information in the table or in the graphic:
Sadly, the graphic does not allow us to answer a single one of these questions. The table answers all of them directly, except for the last one. In essence, the information density of the graphic is very nearly zero. The table has a much higher information density; despite the fact that it uses quite a lot of white space to present a few numbers.
Here is the list of things that went wrong with the 3D-bar diagram:
(In the real presentation that I saw, the text was rendered at a very low resolution with about 10 by 6 pixels per letter with wrong kerning, making the rotated text almost impossible to read.)
You might argue that in the example the exact numbers are not important for the graphic. The important things is the “message,” which is that there are more “very good” and “good” ratings than “bad” and “very bad.” However, to convey this message either use a sentence that says so or use a graphic that conveys this message more clearly:
The above graphic has about the same information density as the table (about the same size and the same numbers are shown). In addition, one can directly “see” that there are more good or very good ratings than bad ones. One can also “see” that the number of people who gave no rating at all is not negligible, which is quite common for feedback forms.
Charts are not always a good idea. Let us look at an example that I redrew from a pie chart in Die Zeit, June 4th, 2005:
This graphic has been redrawn in TikZ, but the original looks very similar.
At first sight, the graphic looks “nice and informative,” but there are a lot of things that went wrong:
In the last case, the different sizes are only partly due to distortion. The designer(s) of the original graphic have also made the “Wind” slice too small, even taking distortion into account. (Just compare the size of “Wind” to “Regenerative” in general.)
Coal as an energy source is split up into two slices: one for “Steinkohle” and one for “Braunkohle” (two different kinds of coal). When you add them up, you see that the whole lower half of the pie chart is taken up by coal.
The two areas for the different kinds of coal are not visually linked at all. Rather, two different colors are used, the labels are on different sides of the graphic. By comparison, “Regenerative” and “Wind” are very closely linked.
Edward Tufte calls graphics like the above “chart junk.”
Here are a few recommendations that may help you avoid producing chart junk:
Pick up your favorite fiction novel and have a look at a typical page. You will notice that the page is very uniform. Nothing is there to distract the reader while reading; no large headlines, no bold text, no large white areas. Indeed, even when the author does wish to emphasize something, this is done using italic letters. Such letters blend nicely with the main text--at a distance you will not be able to tell whether a page contains italic letters, but you would notice a single bold word immediately. The reason novels are typeset this way is the following paradigm: Avoid distractions.
Good typography (like good organization) is something you do not notice. The job of typography is to make reading the text, that is, “absorbing” its information content, as effortless as possible. For a novel, readers absorb the content by reading the text line-by-line, as if they were listening to someone telling the story. In this situation anything on the page that distracts the eye from going quickly and evenly from line to line will make the text harder to read.
Now, pick up your favorite weekly magazine or newspaper and have a look at a typical page. You will notice that there is quite a lot “going on” on the page. Fonts are used at different sizes and in different arrangements, the text is organized in narrow columns, typically interleaved with pictures. The reason magazines are typeset in this way is another paradigm: Steer attention.
Readers will not read a magazine like a novel. Instead of reading a magazine line-by-line, we use headlines and short abstracts to check whether we want to read a certain article or not. The job of typography is to steer our attention to these abstracts and headlines, first. Once we have decided that we want to read an article, however, we no longer tolerate distractions, which is why the main text of articles is typeset exactly the same way as a novel.
The two principles “avoid distractions” and “steer attention” also apply to graphics. When you design a graphic, you should eliminate everything that will “distract the eye.” At the same time, you should try to actively help the reader “through the graphic” by using fonts/colors/line widths to highlight different parts.
Here is a non-exhaustive list of things that can distract readers:
Even though the left grid comes first in our normal reading order, the right one is much more likely to be seen first: The white-to-black contrast is higher than the gray-to-white contrast. In addition, there are more “places” adding to the overall contrast in the right grid.
Things like grids and, more generally, help lines usually should not grab the attention of the readers and, hence, should be typeset with a low contrast to the background. Also, a loosely-spaced grid is less distracting than a very closely-spaced grid.
Do not use different dashing patterns to differentiate curves in plots. You loose data points this way and the eye is not particularly good at “grouping things according to a dashing pattern.” The eye is much better at grouping things according to colors.