More on data visualization

More details on data visualization. Know-how and -why of aesthetics. Basic themes and extensions. Guides: axes, labels, scales, breaks, and legends. Statistical transformations. Text and annotations.

Published

April 19th, 2026

1 Prerequisites

We examine some aspects of ggplot2 in more detail.

library(ggplot2)

We will use the eu_ict and ict data frames in our visualizations.

eu_ict <- readRDS("data/eu_ict.rds")
ict <- readRDS("data/ict.rds")

2 Programming digression: argument passing

In the visualization overview topic, we have followed the convention of explicitly specifying the arguments of functions.

This is not a necessity.

2.1 Argument order and named arguments

There are three ways to pass arguments to functions in R.

By exact matching.
By partial matching.
By position.

2.2 Exact matching

By exact matching.

We specify the argument name and the value.
The names of the arguments are documented in the help files of the functions.
Documentation is available with ?function_name.

Exact matching

By exact matching.

From ?ggplot, we see that the signature of ggplot() is:

Usage:

     ggplot(data = NULL, mapping = aes(), ..., 
            environment = parent.frame())

ggplot(
  data = eu_ict,
  mapping = aes(
    x = output, 
    y = ict_percentage
  )
) +
  geom_point()

2.3 Partial matching

By partial matching.

We specify partially the argument name and the value.
We can provide the minimum number of initial characters (or more) that uniquely identify the argument.

Partial matching

By partial matching.

ggplot(
  d = eu_ict,
  m = aes(
    x = output, 
    y = ict_percentage
  )
) +
  geom_point()

ggplot(
  d = eu_ict,
  map = aes(
    x = output, 
    y = ict_percentage
  )
) +
  geom_point()

2.4 Positional arguments

By position

We specify only values.
Arguments are matched by their position in the function signature.

Positional arguments

By position

Usage:

     ggplot(data = NULL, mapping = aes(), ..., 
            environment = parent.frame())

ggplot(
  eu_ict,
  aes(x = output, y = ict_percentage)
) +
  geom_point()

Positional arguments

The same applies to the aes() function.

Usage:

     aes(x, y, ...)

ggplot(
  eu_ict,
  aes(output, ict_percentage)
) +
  geom_point()

2.5 Why so many ways?

Each approach has pros and cons.

Why so many ways?

Exact matching is quite verbose.

But it is self-documenting and less error-prone.
It is a good practice to use it when readability is important. For example:
- In scripts that are shared with others that are not familiar with the used functions.
- When calling functions that you do not use frequently, and you might need to revisit the code after a long time.

Why so many ways?

Positional matching is concise.

It is a good practice to use it for commonly used functions where the risk of confusion is low. E.g.,
```
ifelse(x > 10, "large", "small")
```
instead of
```
ifelse(test = x > 10, yes = "large", no = "small")
```
Using it in R’s command line for experimentation can be easier.
But it can make reading code less self-contained.

3 Aesthetics mappings

In the visualization overview topic, we used aes to add or modify the appearance of plot elements.

3.1 Defining aesthetics

Aesthetics can be defined at various levels when creating a plot.

Globally in the ggplot() call.

ggplot(
  eu_ict,
  aes(
    output, ict_percentage, 
    color = income
  )
) +
  geom_point()

Defining aesthetics

Aesthetics can be defined at various levels when creating a plot.

Locally at the level of each layer.

ggplot(eu_ict) +
  geom_point(
    aes(
      output, ict_percentage, 
      color = income
    )
  )

Defining aesthetics

Aesthetics can be defined at various levels when creating a plot.

Mixed at the ggplot() and layer level.

How does this work?
Why does it work in this way?

ggplot(
  eu_ict,
  aes(output, ict_percentage)
) +
  geom_point(aes(color = income))

3.2 Defining aesthetics: how

How does this work?

Global assignments in ggplot() affect all layers in the plot.
Local assignments in geom_*() only apply to that layer.
Local assignments take precedence over (override) global assignments.

ggplot(
  eu_ict,
  aes(
    output, ict_percentage, 
    color=income
  )
) +
  geom_point(aes(color = EU))

3.3 Defining aesthetics: why

Why does it work in this way?

We can specify the aesthetics of certain layers with attributes that we do not want to apply globally.

3.4 Defining aesthetics: why

Why does it work in this way?

ggplot(
  eu_ict, 
  aes(output, ict_percentage)
) +
  geom_point(aes(color = income)) +
  geom_smooth(method = "lm")

`geom_smooth()` using formula = 'y ~ x'

ggplot(
  eu_ict, 
  aes(
    output, ict_percentage, 
    color = income
  )
) +
  geom_point() +
  geom_smooth(method = "lm")

`geom_smooth()` using formula = 'y ~ x'

Defining aesthetics

Why does it work in this way?

We also have the flexibility to extend the aesthetics of distinct geometric objects differently.

ggplot(
  eu_ict, aes(output, ict_percentage)
) +
  geom_point(
    aes(color = income, shape = EU)
  ) +
  geom_smooth(
    method = "lm", aes(color = EU)
  )

`geom_smooth()` using formula = 'y ~ x'

4 Theming

Aesthetics modify the appearance of the plot’s data elements.

How can we modify the appearance of the plot’s non-data elements (e.g., axes, background, grid)?
The ggplot2 package provides a set of eight basic themes.

4.1 Basic themes

Basic themes

Basic themes

We can apply themes to the plot by adding them with the + operator.

ggplot(
  eu_ict,
  aes(output, ict_percentage)
) +
  geom_point(aes(color = income, shape = EU)) +
  geom_smooth(
    aes(linetype = EU, group = EU),
    se = FALSE
  ) +
  theme_classic()

ggplot(
  eu_ict,
  aes(output, ict_percentage)
) +
  geom_point(aes(color = income, shape = EU)) +
  geom_smooth(
    aes(linetype = EU, group = EU),
    se = FALSE
  ) +
  theme_bw()

Basic themes

Themes only affect the non-data elements of the plot.

To modify the appearance of the data elements, we still need to use aesthetics.

ggplot(
  eu_ict,
  aes(output, ict_percentage)
) +
  geom_point(aes(color = income, shape = EU)) +
  geom_smooth(
    aes(linetype = EU, group = EU),
    se = FALSE
  ) +
  theme_bw()

Basic themes

Themes only affect the non-data elements of the plot.

To modify the appearance of the data elements, we still need to use aesthetics.

ggplot(
  eu_ict,
  aes(output, ict_percentage, color = income, shape = EU)
) +
  geom_point() +
  geom_smooth(
    aes(linetype = EU, group = EU),
    se = FALSE,
    color = "black"
  ) +
  theme_bw() +
  scale_color_grey()

Basic themes

Themes only affect the non-data elements of the plot.

Here, we have combined the theme_bw() with scale_color_grey() to modify the appearance of the data elements.
In addition, we have explicitly specified the color of the geom_smooth() object to be black.

ggplot(
  eu_ict,
  aes(output, ict_percentage, color = income, shape = EU)
) +
  geom_point() +
  geom_smooth(
    aes(linetype = EU, group = EU),
    se = FALSE,
    color = "black"
  ) +
  theme_bw() +
  scale_color_grey()

4.2 Additional themes

If the basic themes do not meet the stylistic requirements that you want or being asked to follow, taking a look at the ggthemes package is a good idea.

Additional themes

The ggthemes package provides additional themes that might match the desired style.

ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income, shape = EU)) +
  geom_smooth(aes(linetype = EU, group = EU), se = FALSE) +
  ggthemes::theme_excel_new()

5 Guides

Guides are reference lines, grids, or markers assisting in interpreting the geometric object of the visualization.

Axes and legends are the two guides that are most commonly modified in a visualization to facilitate communication.

5.1 Axes

Axes are the typically horizontal and vertical lines that specify the coordinate system of the plot area.

Axes have breaks (ticks) and labels.
- Breaks are the marked points of an axis.
- Labels are the text accompanying the breaks and provide interpretation context for the axis.

5.2 Labels

We have seen in the visualization overview topic how to modify the axes labels of a visualization via the labs() function.

Labels

ict |>
  dplyr::filter(grepl("Euro", geo)) |>
  ggplot(aes(year, ict_percentage)) +
  geom_line() +
  labs(x = "Year", y = "ICT employment %")

Labels

ict |>
  dplyr::filter(grepl("Euro", geo)) |>
  ggplot(aes(year, ict_percentage)) +
  geom_line() +
  labs(x = "Year", y = "ICT employment %")

Labels

ict |>
  dplyr::filter(grepl("Euro", geo)) |>
  ggplot(aes(year, ict_percentage)) +
  geom_line() +
  labs(x = "Year", y = "ICT employment %")

How can we modify the breaks of an axis?

How can we modify the labels of these breaks?

Labels

ict |>
  dplyr::filter(grepl("Euro", geo)) |>
  ggplot(aes(year, ict_percentage)) +
  geom_line() +
  labs(x = "Year", y = "ICT employment %")

Not very intuitively, the breaks and labels of an axis are not modified through labs().

This is because ggplot2 does some heavy lifting for us when drawing the axes of a plot.
Recall that we have used the same calling interface for creating plots with continuous and discrete axes variables (e.g., geom_point and geom_bar).

5.3 Scales

In the background, ggplot2 automatically adjusts the axes based on the type of the variable we provide.

It does so by using the scale_*() family of functions.
Scales are instructions controlling how certain aesthetic mappings are translated into visual properties.
For example, a continuous scale maps the values of an aesthetic to a continuous axis range.

Scales

In ggplot2, continuous variables in geom_point() objects are automatically assigned to a continuous scale scale_x_continuous().

Discrete variables in geom_bar() objects are automatically assigned to a discrete scale scale_color_discrete().
We can modify the default behavior and the appearance of axes by explicitly calling the scale_*() functions.

5.4 Breaks

ict |>
  dplyr::filter(grepl("Euro", geo)) |>
  ggplot(aes(year, ict_percentage)) +
  geom_line() +
  labs(x = "Year", y = "ICT employment %")

ict |>
  dplyr::filter(grepl("Euro", geo)) |>
  ggplot(aes(year, ict_percentage)) +
  geom_line() +
  labs(x = "Year", y = "ICT employment %") +
  scale_x_continuous(
    breaks = c(2004, 2014, 2023)
  )

We can pass directly the breaks we want to have on a continuous axis using the breaks argument.

For instance, if we want to have all the years as breaks, we can pass the year column of the ict data frame.

Breaks

ict |>
  dplyr::filter(grepl("Euro", geo)) |>
  ggplot(aes(year, ict_percentage)) +
  geom_line()  +
  labs(x = "Year", y = "ICT employment %") +
  scale_x_continuous(breaks = ict$year)

Breaks

ict |>
  dplyr::filter(grepl("Euro", geo)) |>
  ggplot(aes(year, ict_percentage)) +
  geom_line()  +
  labs(x = "Year", y = "ICT employment %") +
  scale_x_continuous(breaks = ict$year)

In addition, if we want to modify the labels of the breaks, we can use the labels argument of scale_x_continuous().

Suppose, for example, that instead of having the years on the x-axis, we want to have labels formatted as Year YYYY, where YYYY is the year.

5.5 Breaks and their labels

ict |>
  dplyr::filter(grepl("Euro", geo)) |>
  ggplot(aes(year, ict_percentage)) +
  geom_line() +
  scale_x_continuous(
    breaks = seq(2004, 2023, 2),
    labels = paste("Year", seq(2004, 2023, 2))
  )

5.6 Programming digression: creating sequences

We have used the seq() function to create the breaks and labels of the x-axis.

The seq() function creates sequences of numbers.
There are a few ways to create sequences in R.

Programming digression: creating sequences

The legacy way of creating sequences is to use the : operator.

The : operator is used with infix notation.
It takes two arguments, from and to, and creates a sequence of integers from from to to.

10:20

 [1] 10 11 12 13 14 15 16 17 18 19 20

Programming digression: creating sequences

The : operator has a few disadvantages.

First, it only works with a step of 1 or -1 if the from is smaller than the to.
Second, it can be error-prone when combined with arithmetic operations.

1:3 * 2

[1] 2 4 6

1:(3 * 2)

[1] 1 2 3 4 5 6

Programming digression: creating sequences

A safer and more flexible way to create sequences is to use the seq() function.

The seq() function can create sequences with an arbitrary step size.

seq(1, 3, by = 0.5)

[1] 1.0 1.5 2.0 2.5 3.0

Programming digression: creating sequences

A safer and more flexible way to create sequences is to use the seq() function.

Or it can create sequences between two numbers with a specific length.

seq(1, 3, length.out = 10)

 [1] 1.000000 1.222222 1.444444 1.666667 1.888889 2.111111 2.333333 2.555556
 [9] 2.777778 3.000000

Programming digression: creating sequences

A safer and more flexible way to create sequences is to use the seq() function.

Further, compared to the : operator, there is less risk of confusion when combining seq() with arithmetic operations.

seq(1, 3) * 2

[1] 2 4 6

seq(1, 3 * 2)

[1] 1 2 3 4 5 6

Programming digression: creating sequences

There are two very useful siblings of seq(), named seq_along() and seq_len().

Programming digression: creating sequences

The seq_along() function creates a sequence of integers from 1 to the length of the input vector.

v1 <- c("a", "b", "c")
seq_along(v1)

[1] 1 2 3

This is useful when we want to enumerate the elements of a vector.

Compared to:

v1 <- c("a", "b", "c")
seq(length(v1))

[1] 1 2 3

Programming digression: creating sequences

The seq_len() function creates a sequence of integers from 1 to the input number.

seq_len(5)

[1] 1 2 3 4 5

It gives the same result as seq(1, 5).

5.7 Rotating breaks’ labels

When plotting high-frequency time-series data, the labels of the breaks on the horizontal axis can get crowded.

One common way to address this issue is to rotate the labels.
Rotating the labels does not affect the breaks or the labels themselves, only their orientation.
We can rotate the breaks’ labels using the theme() function.

Rotating breaks’ labels

The theme() function has (a lot of) options for modifying the plot’s theming.

We can modify the appearance of the axes using the axis.text.x and axis.text.y arguments.
The element_text() function is used to modify the appearance of the labels’ text.
Rotating the labels is done by setting the angle argument to the desired angle (in degrees).
The vjust and hjust arguments control the vertical and horizontal justification of the text.

Rotating breaks’ labels

ict |>
  dplyr::filter(grepl("Euro", geo)) |>
  ggplot(aes(year, ict_percentage)) +
  geom_line() +
  scale_x_continuous(breaks = ict$year, labels = ict$year) +
  theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust = 0.5))

5.8 Legends

Another useful option exposed by theme() is the legend.position argument.

ict |>
  dplyr::filter(geo %in% sample(unique(ict$geo), size = 5)) |>
  ggplot(aes(year, ict_percentage, color = geo)) +
  geom_line() +
  theme(legend.position = "top")

Legends

The legend.position argument can take the following values:
- "none": no legend is displayed.
- "left", "right", "top", "bottom": the legend is displayed on the left, right, top, or bottom of the plot area.
- "inside": the legend is displayed inside the plot area.

Legends

Besides customization via theme(), legends can be modified using the guides() function.

The guides() function offers more fine-grained control over the appearance of the legend.
For example, we can modify the number of rows or columns of the legend.
Or we can override the size and shape of the legend markers.

Legends

ict |>
  dplyr::filter(
    geo %in% sample(unique(ict$geo), size = 8)
  ) |>
  ggplot(
    aes(year, ict_percentage, color = geo)
  ) +
  geom_line() + 
  theme(legend.position = "top") +
  guides(
    color = guide_legend(
      title = "Country",
      nrow = 2,
      override.aes = list(linewidth = 4)
    )
  )

6 Statistical transformations

How are data mapped to geometric objects?

When creating a bar chart, we pass one column to geom_bar(), and the function automatically calculates the height of the bars.
When creating a density plot, we pass one column to geom_density(), and the function automatically calculates the density of the data.

6.1 Behind every geometric object

How are data mapped to geometric objects?

This pattern is common in ggplot2.
The passed data is transformed into a new form that is used to create the plot.

We examine some more details of the statistical transformations taking place behind the scenes when creating geometric objects.

6.2 The statistic behind `geom_bar()`

In the visualization overview topic, we created a bar chart of the income variable eu_ict.

eu_ict |>
  ggplot(aes(income)) +
  geom_bar()

Where is the count variable of the vertical axis coming from?

The statistic behind `geom_bar()`

Where is the count variable of the vertical axis coming from?

eu_ict |>
  ggplot(aes(income)) +
  geom_bar()

We have never defined count as an aesthetic.
Even stranger, count is not among the columns of the eu_ict dataset.

names(eu_ict)

[1] "geo"            "EU"             "ict_percentage" "income"        
[5] "output"

The statistic behind `geom_bar()`

Examining the documentation of geom_bar(), we observe that there is a stat argument that defaults to count.

Usage:

 geom_bar(
   mapping = NULL,
   data = NULL,
   stat = "count",
   position = "stack",
   ...,
   just = 0.5,
   width = NULL,
   na.rm = FALSE,
   orientation = NA,
   show.legend = NA,
   inherit.aes = TRUE
 )

The statistic behind `geom_bar()`

Behind the scenes, geom_bar() calculates the number of times each value of income is found in the data.

eu_ict |>
  dplyr::count(income)

# A tibble: 3 × 2
  income     n
  <fct>  <int>
1 low        3
2 middle    17
3 high      12

And then uses the new variable to set the heights.

The statistic behind `geom_bar()`

We can manually replicate the calculation and instruct geom_bar() not to perform any further transformation.

Instructing a geom_* function not to apply any statistical transformation to the input data is done by passing stat = "identity" to the function.

eu_ict |>
  dplyr::count(income) |>
  dplyr::rename(count = n) |>
  ggplot(aes(income, count)) +
  geom_bar(stat = "identity")

6.3 The statistics behind `geom_smooth()`

Other geom_* functions calculate different statistics by default.

For instance, geom_smooth() calculates fitted values, standard errors, and confidence intervals.

eu_ict |>
  ggplot(aes(output, ict_percentage)) +
  stat_smooth(method = "lm")

`geom_smooth()` using formula = 'y ~ x'

6.4 Programming digression: formulas

How can we replicate the geom_smooth()’s statistics?

The "lm" part of the method argument stands for linear model.
Linear models are statistical models having linear relationships between the dependent and independent variables.
For example, classic linear regressions are linear models.

Programming digression: formulas

In R, we have a neat way to define statistical models using formulas.

Using formulas with ggplot2 and statistical functions allows us to focus on relationships in the data and leave the details of the statistical calculations to the functions.

Programming digression: formulas

A basic formula in R has two main parts:
- The left-hand side (LHS) of the formula is the dependent variable.
- The right-hand side (RHS) of the formula is the independent variable(s).
- The two parts are separated by a tilde ~ symbol.

For example:

ict_percentage ~ output

ict_percentage ~ output

Programming digression: formulas

Had we had more than one independent variable, we would have written:

ict_percentage ~ output + ind_var2 + ind_var3

ict_percentage ~ output + ind_var2 + ind_var3

Note that we used the variables ind_var1 and ind_var2 in the formula, which neither were defined nor exist in any of our datasets.
And R does not complain about it.

Programming digression: formulas

Had we had more than one independent variable, we would have written:

ict_percentage ~ output + ind_var2 + ind_var3

ict_percentage ~ output + ind_var2 + ind_var3

This is because the formula does not actually calculate anything.

It is an unevaluated expression that explains the logic of the model.

Linear regressions

We can fit a linear model using formulas with the lm() function.

fit <- lm(ict_percentage ~ output, eu_ict)

The first argument of lm() is the formula we want to estimate.
The second argument is the dataset.
The lm() function automatically searches for the formula variables in the dataset and fits the model.

Linear regressions

fit <- lm(ict_percentage ~ output, eu_ict)

Symbolically, we estimated the model,

\[ y_{i} = \beta_{0} + \beta_{1} x_{i} + \varepsilon_{i}, \]
- \(y_{i}\) is the ict_percentage,
- \(x_{i}\) is the output,
- \(\beta_{0}\) is the intercept,
- \(\beta_{1}\) is the slope, and
- \(\varepsilon_{i}\) is the error term.

Linear regressions

fit <- lm(ict_percentage ~ output, eu_ict)

How can we extract the predicted values from the model?

\[ \hat{y}_{i} = \hat{\beta}_{0} + \hat{\beta}_{1} x_{i} \]

pred_y <- predict(fit)

Linear regressions

fit <- lm(ict_percentage ~ output, eu_ict)

How can we extract the confidence intervals?

\[ ce(\hat{y}_{i}) = [\hat{y}_{i} - t_{\alpha/2} \times \sigma(\hat{y}_{i}), \hat{y}_{i} + t_{\alpha/2} \sigma (\hat{y}_{i})] \]

where

\[ \sigma(\hat{y}_{i}) = \hat\sigma \sqrt{\frac{1}{n} + \frac{(x_{i} - \bar{x})^{2}}{\sum_{i=1}^{n} (x_{i} - \bar{x})^{2}}} \]

Linear regressions

fit <- lm(ict_percentage ~ output, eu_ict)

How can we extract the confidence intervals?

pred_y <- predict(fit, interval = "confidence")

When passing interval = "confidence", the predict() function returns a matrix with three columns:
- fit: the predicted values,
- lwr: the lower bound of the confidence interval, and
- upr: the upper bound of the confidence interval.
We can examine the first few rows with head().

Linear regressions

fit <- lm(ict_percentage ~ output, eu_ict)

How can we extract the confidence intervals?

pred_y <- predict(fit, interval = "confidence")

We can examine the first few rows with head().

head(pred_y)

       fit      lwr      upr
1 5.277672 4.814736 5.740609
2 5.251859 4.792837 5.710880
3 3.871531 3.201614 4.541448
4 6.498425 5.648143 7.348707
5 4.865592 4.427362 5.303822
6 4.368092 3.852660 4.883525

We can now replicate the geom_smooth()’s statistics, and silence its noisy message about the formula use.

Linear regressions

fit <- lm(ict_percentage ~ output, eu_ict)
pred_y <- predict(fit, interval = "confidence")
eu_ict |>
  dplyr::mutate(
    pred = pred_y[, "fit"],
    ymin = pred_y[, "lwr"],
    ymax = pred_y[, "upr"]
  ) |>
  ggplot(aes(output)) +
  geom_line(aes(y = pred), color = "blue", linewidth = 1) +
  geom_ribbon(
    aes(ymin = ymin, ymax = ymax),
    fill = "darkgray",
    alpha = 0.5
  )

eu_ict |>
  ggplot(aes(output, ict_percentage)) +
  stat_smooth(method = "lm", formula = y ~ x)

7 Text and annotations

A common pattern in data science visualizations is the use of annotations and text.

Text and annotations are commonly used to:
- Provide context
- Highlight specific data points
- Explain the data
- Add captions

7.1 Captions

Captions can be effortlessly added to a figure with labs().

Captions

ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = EU)) +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(caption = "Data Source: Eurostat")

Captions

ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = EU)) +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(caption = "Data Source: Eurostat")

7.2 Labels with Formulas

On some occasions, data scientists want to include the underlying mathematical details of their work in a visualization.

For example, suppose we are working with the function \[ f(x) = \sin(2x) + e^{-x/10} \cdot \cos(x) \]

We can create a plot of \(f\) with geom_line().

Labels with Formulas

\[ f(x) = \sin(2x) + e^{-x/10} \cdot \cos(x) \]

data <- data.frame(x = seq(0, 50, .01))
ggplot(data, aes(x = x)) +
  geom_line(aes(y = sin(2*x) + exp(-x/10) * cos(x)), color = "red")

Labels with Formulas

Notice that ggplot2 writes the formula expression as a string in the vertical axis label.

However, the used formatting is rather unusual for human readers.

For instance, multiplication is denoted with *, while in mathematical typography it is usually omitted.

We can use quote() in labs() to instruct ggplot2 to render the expression in a more human-customary way.

Labels with Formulas

data <- data.frame(x = seq(0, 50, .01))
ggplot(data, aes(x = x)) +
  geom_line(aes(y = sin(2*x) + exp(-x/10) * cos(x)), color = "red") +
  labs(x = quote(x), y = quote(sin(2 * x) + exp(-x/10) * cos(x)))

7.3 Annotations

Besides captions and labels, we can add text and markers directly into the main body of the plot.

These types of additions to a plot are called annotations.
An annotation is an additional piece of information that is added to a plot and facilitates the interpretation of its data elements.
In ggplot2, annotations and text can be added with annotate() and geom_text().

7.4 Using `geom_text()`: Example 1

We want to textually highlight the non-EU countries in the eu_ict’s scatter plot.

eu_ict |>
  ggplot(aes(output, ict_percentage)) +
  geom_point()

We start with the usual geom_point() plot.

Using `geom_text()`: Example 1

We want to textually highlight the non-EU countries in the eu_ict’s scatter plot.

eu_ict |>
  ggplot(aes(output, ict_percentage)) +
  geom_point() +
  geom_text(aes(label = geo))

We pass the label = geo aesthetic to geom_text() to create a text object using country names.

However, this creates a text object for all data points.

Using `geom_text()`: Example 1

We want to textually highlight the non-EU countries in the eu_ict’s scatter plot.

eu_ict |>
  ggplot(aes(output, ict_percentage)) +
  geom_point() +
  geom_text(
    data = eu_ict |> 
      dplyr::filter(EU == "NON-EU"),
    aes(label = geo)
  )

We override the data argument of geom_text() to filter only the non-EU countries.

This looks more like what we want to achieve.
Still, the text and the points are overlapping.

Using `geom_text()`: Example 1

We want to textually highlight the non-EU countries in the eu_ict’s scatter plot.

eu_ict |>
  ggplot(aes(output, ict_percentage)) +
  geom_point() +
  geom_text(
    data = eu_ict |> 
      dplyr::filter(EU == "NON-EU"),
    aes(label = geo),
    hjust = "left",
    vjust = "top"
  )

We use hjust = "left" and vjust = "top" to align the text to the top-left corner.

Better, but some extra spacing could improve the aesthetics.

Using `geom_text()`: Example 1

We want to textually highlight the non-EU countries in the eu_ict’s scatter plot.

eu_ict |>
  ggplot(aes(output, ict_percentage)) +
  geom_point() +
  geom_text(
    data = eu_ict |> 
      dplyr::filter(EU == "NON-EU"),
    aes(label = geo),
    hjust = "left",
    vjust = "top",
    nudge_y = -0.1
  )

We use nudge_y = -0.1 to move the text slightly below its data point.

Using `geom_text()`: Example 1

We want to textually highlight the non-EU countries in the eu_ict’s scatter plot.

eu_ict |>
  ggplot(aes(output, ict_percentage)) +
  geom_point() +
  geom_text(
    data = eu_ict |> 
      dplyr::filter(EU == "NON-EU"),
    aes(label = geo),
    hjust = "left",
    vjust = "top",
    nudge_y = -0.1,
    size = 4
  )

Finally, we can adjust the text size with the size argument.

Using `geom_text()`: Example 1

We want to textually highlight the non-EU countries in the eu_ict’s scatter plot.

7.5 Using `geom_text()`: Example 2

We want to textually highlight regression lines per group.

ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income)) +
  geom_smooth(
    aes(group = EU), method = "lm", se = FALSE, formula = y ~ x
  )

Suppose we want to add the country names at the end of each regression line.
We can pick the maximum output value per group and use it as the x aesthetic in geom_text().

Using `geom_text()`: Example 2

We want to textually highlight regression lines per group.

ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income)) +
  geom_smooth(
    aes(group = EU), method = "lm", se = FALSE, formula = y ~ x
  ) +
  geom_text(
    data = eu_ict |>
      dplyr::group_by(EU) |>
      dplyr::slice_max(output, n = 1)
  )

We use slice_max() to pick the maximum output value per group.
And override the data argument of geom_text() to use the sliced data.

Using `geom_text()`: Example 2

We want to textually highlight regression lines per group.

ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income)) +
  geom_smooth(
    aes(group = EU), method = "lm", se = FALSE, formula = y ~ x
  ) +
  geom_text(
    data = eu_ict |>
      dplyr::group_by(EU) |>
      dplyr::slice_max(output, n = 1),
    aes(output, ict_percentage, label = EU)
  )

We pass the aesthetics we want to use in the mapping argument of geom_text().

Using `geom_text()`: Example 2

We want to textually highlight regression lines per group.

ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income)) +
  geom_smooth(
    aes(group = EU), method = "lm", se = FALSE, formula = y ~ x
  ) +
  geom_text(
    data = eu_ict |>
      dplyr::group_by(EU) |>
      dplyr::slice_max(output, n = 1),
    aes(output, ict_percentage, label = EU),
    hjust = "left",
    vjust = "bottom",
    nudge_x = 1000
  )

And fine-tune the appearance of the text with hjust, vjust, and nudge_x.

7.6 Annotating

Another way to add text to a plot is with annotate().

In contrast to geom_text(), which is a geometric object, annotate() does not act on data points.
This means that annotate() does not require a data argument.
And it is more useful for adding small, data-independent elements to a plot.

7.7 Using `annotate()` for text

We want to add a label next to the richest EU country.

We start once more with a geom_point() scatter plot of the eu_ict’s income data.

Using `annotate()` for text

We want to add a label next to the richest EU country.

ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income, shape = EU))

Using `annotate()` for text

We want to add a label next to the richest EU country.

ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income, shape = EU)) +
  annotate(geom = "label")

And add an annotate() of geometric type label.

On its own, this gives an error if executed.
We need to provide the label text for the annotation.
And specify the position for the label using the x and y arguments.

Using `annotate()` for text

We want to add a label next to the richest EU country.

richest <- eu_ict |>
  dplyr::filter(EU == "EU") |>
  dplyr::slice_max(output, n = 1)
ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income, shape = EU)) +
  annotate(geom = "label")

We can do some preliminary data transformations to find the richest EU country.

Using `annotate()` for text

We want to add a label next to the richest EU country.

richest <- eu_ict |>
  dplyr::filter(EU == "EU") |>
  dplyr::slice_max(output, n = 1)
ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income, shape = EU)) +
  annotate(
    geom = "label",
    label = richest$geo,
    x = richest$output,
    y = richest$ict_percentage
  )

And use the calculated richest data to pass more information to annotate().

Using `annotate()` for text

We want to add a label next to the richest EU country.

richest <- eu_ict |>
  dplyr::filter(EU == "EU") |>
  dplyr::slice_max(output, n = 1)
ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income, shape = EU)) +
  annotate(
    geom = "label",
    label = richest$geo,
    x = richest$output,
    y = richest$ict_percentage,
    hjust = "right"
  )

We can adjust the label’s text horizontal alignment.

Using `annotate()` for text

We want to add a label next to the richest EU country.

richest <- eu_ict |>
  dplyr::filter(EU == "EU") |>
  dplyr::slice_max(output, n = 1)
ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income, shape = EU)) +
  annotate(
    geom = "label",
    label = richest$geo,
    x = richest$output,
    y = richest$ict_percentage,
    hjust = "right"
  )

The annotate() function does not have nudge_* arguments (why?).

We can directly adjust the x and y positions to move the label around.

Using `annotate()` for text

We want to add a label next to the richest EU country.

richest <- eu_ict |>
  dplyr::filter(EU == "EU") |>
  dplyr::slice_max(output, n = 1)
ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income, shape = EU)) +
  annotate(
    geom = "label",
    label = richest$geo,
    x = richest$output - 1000,
    y = richest$ict_percentage,
    hjust = "right"
  )

Using `annotate()` for text

7.8 Using `annotate()` for segments

Using annotate() is very useful for adding segments and arrows to a plot.

The calling interface is mostly similar to annotate() for text.
Instead of geom = "label", we use geom = "segment" to create annotations with segments and arrows.

Using `annotate()` for segments

richest <- eu_ict |>
  dplyr::filter(EU == "EU") |>
  dplyr::slice_max(output, n = 1)
ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income, shape = EU)) +
  annotate(
    geom = "label",
    label = richest$geo,
    x = richest$output - 8000,
    y = richest$ict_percentage,
    hjust = "right"
  )

We nudge the label of the richest country a bit more to the left.

Using `annotate()` for segments

richest <- eu_ict |>
  dplyr::filter(EU == "EU") |>
  dplyr::slice_max(output, n = 1)
ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income, shape = EU)) +
  annotate(
    geom = "label",
    label = richest$geo,
    x = richest$output - 8000,
    y = richest$ict_percentage,
    hjust = "right"
  ) +
  annotate(geom = "segment")

And add a new annotate() layer with geom = "segment" to create a segment.

Using `annotate()` for segments

richest <- eu_ict |>
  dplyr::filter(EU == "EU") |>
  dplyr::slice_max(output, n = 1)
ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income, shape = EU)) +
  annotate(
    geom = "label",
    label = richest$geo,
    x = richest$output - 8000,
    y = richest$ict_percentage,
    hjust = "right"
  ) +
  annotate(geom = "segment")

Specifying a segment on a plane is equivalent to specifying two points.

The annotate() function expects two points to draw the segment.
The points are specified by the x, y, xend, and yend arguments.

Using `annotate()` for segments

richest <- eu_ict |>
  dplyr::filter(EU == "EU") |>
  dplyr::slice_max(output, n = 1)
ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income, shape = EU)) +
  annotate(
    geom = "label",
    label = richest$geo,
    x = richest$output - 8000,
    y = richest$ict_percentage,
    hjust = "right",
  ) +
  annotate(
    geom = "segment",
    x = richest$output - 8000,
    xend = richest$output - 500,
    y = richest$ict_percentage,
    yend = richest$ict_percentage
  )

This creates a segment connecting the two points, but not an arrowhead.

Using `annotate()` for segments

richest <- eu_ict |>
  dplyr::filter(EU == "EU") |>
  dplyr::slice_max(output, n = 1)
ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income, shape = EU)) +
  annotate(
    geom = "label",
    label = richest$geo,
    x = richest$output - 8000,
    y = richest$ict_percentage,
    hjust = "right",
  ) +
  annotate(
    geom = "segment",
    x = richest$output - 8000,
    xend = richest$output - 500,
    y = richest$ict_percentage,
    yend = richest$ict_percentage,
    arrow = arrow(type = "closed", length = unit(0.3, "cm"))
  )

We can add an arrowhead to the plot using the arrow argument.

Using `annotate()` for segments

richest <- eu_ict |>
  dplyr::filter(EU == "EU") |>
  dplyr::slice_max(output, n = 1)
ggplot(eu_ict, aes(output, ict_percentage)) +
  geom_point(aes(color = income, shape = EU)) +
  annotate(
    geom = "label",
    label = richest$geo,
    x = richest$output - 8000,
    y = richest$ict_percentage,
    hjust = "right",
  ) +
  annotate(
    geom = "segment",
    x = richest$output - 8000,
    xend = richest$output - 500,
    y = richest$ict_percentage,
    yend = richest$ict_percentage,
    arrow = arrow(type = "closed", length = unit(0.3, "cm")),
    color = "red"
  )

Finally, the color of the annotation can be adjusted with the color argument.

1 Prerequisites

2 Programming digression: argument passing

2.1 Argument order and named arguments

2.2 Exact matching

Exact matching

2.3 Partial matching

Partial matching

2.4 Positional arguments

Positional arguments

Positional arguments

2.5 Why so many ways?

Why so many ways?

Why so many ways?

3 Aesthetics mappings

3.1 Defining aesthetics

Defining aesthetics

Defining aesthetics

3.2 Defining aesthetics: how

3.3 Defining aesthetics: why

3.4 Defining aesthetics: why

Defining aesthetics

4 Theming

4.1 Basic themes

Basic themes

Basic themes

Basic themes

Basic themes

Basic themes

4.2 Additional themes

Additional themes

5 Guides

5.1 Axes

5.2 Labels

Labels

Labels

Labels

Labels

5.3 Scales

Scales

5.4 Breaks

Breaks

Breaks

5.5 Breaks and their labels

5.6 Programming digression: creating sequences

Programming digression: creating sequences

Programming digression: creating sequences

Programming digression: creating sequences

Programming digression: creating sequences

Programming digression: creating sequences

Programming digression: creating sequences

Programming digression: creating sequences

Programming digression: creating sequences

5.7 Rotating breaks’ labels

Rotating breaks’ labels

Rotating breaks’ labels

5.8 Legends

Legends

Legends

Legends

6 Statistical transformations

6.1 Behind every geometric object

6.2 The statistic behind geom_bar()

The statistic behind geom_bar()

The statistic behind geom_bar()

The statistic behind geom_bar()

The statistic behind geom_bar()

6.3 The statistics behind geom_smooth()

6.4 Programming digression: formulas

Programming digression: formulas

Programming digression: formulas

Programming digression: formulas

Programming digression: formulas

Linear regressions

Linear regressions

Linear regressions

Linear regressions

Linear regressions

Linear regressions

Linear regressions

7 Text and annotations

6.2 The statistic behind `geom_bar()`

The statistic behind `geom_bar()`

The statistic behind `geom_bar()`

The statistic behind `geom_bar()`

The statistic behind `geom_bar()`

6.3 The statistics behind `geom_smooth()`

7.4 Using `geom_text()`: Example 1

Using `geom_text()`: Example 1

Using `geom_text()`: Example 1

Using `geom_text()`: Example 1

Using `geom_text()`: Example 1

Using `geom_text()`: Example 1

Using `geom_text()`: Example 1

7.5 Using `geom_text()`: Example 2

Using `geom_text()`: Example 2

Using `geom_text()`: Example 2

Using `geom_text()`: Example 2

7.7 Using `annotate()` for text

Using `annotate()` for text

Using `annotate()` for text

Using `annotate()` for text

Using `annotate()` for text

Using `annotate()` for text

Using `annotate()` for text

Using `annotate()` for text

Using `annotate()` for text

7.8 Using `annotate()` for segments

Using `annotate()` for segments

Using `annotate()` for segments

Using `annotate()` for segments

Using `annotate()` for segments

Using `annotate()` for segments

Using `annotate()` for segments