Scientific Design Choices using Graphical Display Options

Plotting a single variable should be fairly easy. The type of variable will influence the type
of graphic chosen. For instance, histograms or boxplots are right for continuous variables,
while bar charts or pie charts are appropriate for categorical variables. In both cases other
choices are possible too. Whether the data should be transformed or aggregated will depend
on the distribution of the data and the goal of the graphic. Scaling and captioning should be
relatively straightforward, though they need to be chosen with care.
It is a different matter with multivariate graphics, where even displaying the joint distribution
of two categorical variables is not simple. The main decision to be taken for a multivariate
graphic is the form of display, though the choice of variables and their ordering are also
important. In general a dependent variable should be plotted last. In a scatterplot it is
traditional to plot the dependent variable on the vertical axis.

Choice of Graphical Form

Choice of Graphical Form here are bar charts, pie charts, histograms, dot plots, boxplots,
scatterplots, rose plots, mosaic plots and many other kinds of data display. he choice depends
on the type of data to be displayed (e.g. univariate continuous data cannot be displayed in a
pie chart and bivariate categorical data cannot be displayed in a boxplot) and on what is to be
shown (e.g. pie charts are good for displaying shares for a small number of categories and
boxplots are good for emphasizing outliers). A poor choice graph type cannot be rectified by
other means, so it is important to get it right at the start. However, there is not always a unique
optimal choice and alternatives can be equally good or good in different ways, emphasizing
different aspects of the same data. Provided an appropriate form has been chosen, there are
many options to consider. Simply adopting the default of whatever computer software is being
used is unlikely to be wise.

Graphical Display Options
Scales

Defining the scale for the axis for a categorical variable is a matter of choosing an informative
ordering. This may depend on what the categories represent or on their relative sizes. For a
continuous variable it is more difficult. The endpoints, divisions and tick marks have to be
chosen. Initially it is surprising when apparently reliable software produces a really bad scale
for some variable. It seems obvious what the scale should have been. It is only when you start
trying to design your own algorithm for automatically determining scales that you discover
how difficult the task is.
In Grammar of Graphics Wilkinson puts forward some plausible properties that ‘nice’ scales
should possess and suggests a possible algorithm. The properties (simplicity, granularity and
coverage, with the bonus of being called ‘really nice’ if zero is included) are good but the
algorithm is easy to outwit. This is not to say that it is a weak algorithm. What is needed is a
method which gives acceptable results for as high a percentage of the time as possible, and
the user must also check the resulting scale and be prepared to amend it for his or her data.
Difficult cases for scaling algorithms arise when data cross natural boundaries, e.g., data with
a range of 4 to 95 would be easy to scale, whereas data with a range of 4 to 101 would be
more awkward.

Related Posts

Leave a Comment Cancel Reply