An introduction to importing data

Comma-separated vales format. Importing data from the disk or the web. Relative and absolute paths. An overview of other data file formats.

Published

April 19th, 2026

1 Back at the sources

In the transformation overview topic, we have imported the original Eurostat data without dwelling too much on the details.

We used the readr package to load data from CSV files.
In this topic, we will take a bit closer look at the readr package and its functions.

1.1 A look at the CSV files

A very common way for sharing statistical data is via the Comma Separated Values (CSV) file format.

CSV data has a rectangular shape.
More often than not, they come with a header row specifying the column names.
Values are separated by a comma.
Alphanumeric values are usually, but not always, enclosed in double quotes.

A look at the CSV files

We can examine the first few lines of the estat_sdg_08_10_en.csv as an example.

DATAFLOW,LAST UPDATE,freq,unit,na_item,geo,TIME_PERIOD,OBS_VALUE,OBS_FLAG
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2000,1700,
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2001,1850,
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2002,1940,
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2003,2060,

A look at the CSV files

We can examine the first few lines of the estat_sdg_08_10_en.csv as an example.

DATAFLOW,LAST UPDATE,freq,unit,na_item,geo,TIME_PERIOD,OBS_VALUE,OBS_FLAG
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2000,1700,
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2001,1850,
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2002,1940,
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2003,2060,

We observe that the estat_sdg_08_10_en.csv file has a header row, from which the initial column names of the transformation overview topic’s data frame were inferred.

A look at the CSV files

We can examine the first few lines of the estat_sdg_08_10_en.csv as an example.

DATAFLOW,LAST UPDATE,freq,unit,na_item,geo,TIME_PERIOD,OBS_VALUE,OBS_FLAG
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2000,1700,
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2001,1850,
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2002,1940,
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2003,2060,

Values in each row are separated by commas.

A look at the CSV files

We can examine the first few lines of the estat_sdg_08_10_en.csv as an example.

DATAFLOW,LAST UPDATE,freq,unit,na_item,geo,TIME_PERIOD,OBS_VALUE,OBS_FLAG
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2000,1700,
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2001,1850,
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2002,1940,
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2003,2060,

The values of some columns are not enclosed in double quotes.

A look at the CSV files

We can examine the first few lines of the estat_sdg_08_10_en.csv as an example.

DATAFLOW,LAST UPDATE,freq,unit,na_item,geo,TIME_PERIOD,OBS_VALUE,OBS_FLAG
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2000,1700,
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2001,1850,
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2002,1940,
ESTAT:SDG_08_10(1.0),20/12/24 11:00:00,Annual,"Chain linked volumes (2010), euro per capita",Gross domestic product at market prices,Albania,2003,2060,

While values in other columns are enclosed in double quotes.

Observe that the values of these columns contain commas in their text.
We enclose such values in double quotes to avoid confusion with the column separator of the CSV file.

2 Importing CSV files

We can easily import CSV-formatted data into R using the read_csv() function from the readr package.

library(readr)

2.1 Importing data with `read_csv()`

Calling read_csv() requires specifying the location of the CSV file with the file argument.

read_csv(file = "data/estat_sdg_08_10_en.csv")

In this case, we indicate to read_csv() that the file is stored locally on our computer, in the data directory.
We can specify the file location as an absolute path, as a relative path, or as a web address.
In this case, we used a relative path.

2.2 Relative paths

Calling read_csv() requires specifying the location of the CSV file with the file argument.

read_csv(file = "data/estat_sdg_08_10_en.csv")

Relative paths are resolved based on the current working directory.

For instance, if the current working directory is C:/Users/user, then the relative path we supplied points to the file:

C:/Users/user/data/estat_sdg_08_10_en.csv

2.3 Absolute paths

Calling read_csv() requires specifying the location of the CSV file with the file argument.

read_csv(file = "/home/user/data/estat_sdg_08_10_en.csv")

Absolute paths are resolved based on the root directory of the file system.

If the file argument we supply starts with a /, then read_csv interprets it as an absolute path.
The file argument points to the exact location of the file in the file system.

2.4 Web addresses

Calling read_csv() requires specifying the location of the CSV file with the file argument.

read_csv(
  file = "https://ec.europa.eu/eurostat/api/dissemination/sdmx/3.0/data/dataflow/ESTAT/sdg_08_10/1.0"
)

Web addresses refer to files stored on remote servers.

If the file argument we supply starts with a protocol (e.g., http:// or https://), then read_csv interprets it as a web address.
The file is downloaded from the web and imported into R.

2.5 Reading CSV data

Calling read_csv() returns a tibble (data frame) with the contents of the CSV file.

We can assign the result to a variable to store the data frame in memory.
Assigning objects to variable names in R is done with the assignment operator: <-.

sdg <- read_csv(file = "data/estat_sdg_08_10_en.csv")

Reading CSV data

Calling read_csv() returns a tibble (data frame) with the contents of the CSV file.

sdg <- read_csv(file = "data/estat_sdg_08_10_en.csv")

We can read the last expression from right to left as “the result of reading the CSV file is assigned to the variable sdg.”

Reading CSV data

By assigning the data to the sdg variable, we do not need to read the CSV file again to access the data.

sdg

# A tibble: 1,845 × 9
   DATAFLOW        `LAST UPDATE` freq  unit  na_item geo   TIME_PERIOD OBS_VALUE
   <chr>           <chr>         <chr> <chr> <chr>   <chr>       <dbl>     <dbl>
 1 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2000      1700
 2 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2001      1850
 3 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2002      1940
 4 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2003      2060
 5 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2004      2180
 6 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2005      2310
 7 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2006      2460
 8 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2007      2630
 9 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2008      2850
10 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2009      2960
# ℹ 1,835 more rows
# ℹ 1 more variable: OBS_FLAG <chr>

read_csv(file = "data/estat_sdg_08_10_en.csv")

# A tibble: 1,845 × 9
   DATAFLOW        `LAST UPDATE` freq  unit  na_item geo   TIME_PERIOD OBS_VALUE
   <chr>           <chr>         <chr> <chr> <chr>   <chr>       <dbl>     <dbl>
 1 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2000      1700
 2 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2001      1850
 3 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2002      1940
 4 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2003      2060
 5 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2004      2180
 6 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2005      2310
 7 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2006      2460
 8 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2007      2630
 9 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2008      2850
10 ESTAT:SDG_08_1… 20/12/24 11:… Annu… Chai… Gross … Alba…        2009      2960
# ℹ 1,835 more rows
# ℹ 1 more variable: OBS_FLAG <chr>

2.6 A look at `read_csv()` messages

Calling read_csv() also prints some information.

sdg <- read_csv(file = "data/estat_sdg_08_10_en.csv")

Rows: 1845 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): DATAFLOW, LAST UPDATE, freq, unit, na_item, geo, OBS_FLAG
dbl (2): TIME_PERIOD, OBS_VALUE

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

A look at `read_csv()` messages

Calling read_csv() also prints some information.

sdg <- read_csv(file = "data/estat_sdg_08_10_en.csv")

Rows: 1845 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): DATAFLOW, LAST UPDATE, freq, unit, na_item, geo, OBS_FLAG
dbl (2): TIME_PERIOD, OBS_VALUE

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

We are informed that 1845 rows and 9 columns were imported.

A look at `read_csv()` messages

Calling read_csv() also prints some information.

sdg <- read_csv(file = "data/estat_sdg_08_10_en.csv")

Rows: 1845 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): DATAFLOW, LAST UPDATE, freq, unit, na_item, geo, OBS_FLAG
dbl (2): TIME_PERIOD, OBS_VALUE

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The delimiter used to separate values is a comma.

A look at `read_csv()` messages

Calling read_csv() also prints some information.

sdg <- read_csv(file = "data/estat_sdg_08_10_en.csv")

Rows: 1845 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): DATAFLOW, LAST UPDATE, freq, unit, na_item, geo, OBS_FLAG
dbl (2): TIME_PERIOD, OBS_VALUE

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Seven columns were parsed as character vectors (chr).
Two columns were parsed as double precision numbers (dbl). We can roughly consider them as real numbers.

A look at `read_csv()` messages

Calling read_csv() also prints some information.

sdg <- read_csv(file = "data/estat_sdg_08_10_en.csv")

Rows: 1845 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): DATAFLOW, LAST UPDATE, freq, unit, na_item, geo, OBS_FLAG
dbl (2): TIME_PERIOD, OBS_VALUE

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

We can get more information about the data types using spec().
We can suppress this message with a couple of options.

A look at `read_csv()` messages

We can suppress this message with a couple of options.

sdg <- read_csv(file = "data/estat_sdg_08_10_en.csv", show_col_types = FALSE)

One way is to pass show_col_types = FALSE to read_csv().

Another way is to manually specify (some of the) column types.

A look at `read_csv()` messages

We can suppress this message with a couple of options.

sdg <- read_csv(
  file = "data/estat_sdg_08_10_en.csv",
  col_types = list(
    unit = col_factor(),
    TIME_PERIOD = col_integer(),
    `LAST UPDATE` = col_datetime(format = "%d/%m/%y %H:%M:%S")
  )
)

Column types are specified as a list in the col_types argument.
Each item in the list specifies a column name and a corresponding type.

A look at `read_csv()` messages

We can suppress this message with a couple of options.

sdg <- read_csv(
  file = "data/estat_sdg_08_10_en.csv",
  col_types = list(
    unit = col_factor(),
    TIME_PERIOD = col_integer(),
    `LAST UPDATE` = col_datetime(format = "%d/%m/%y %H:%M:%S")
  )
)

The unit column is parsed as a factor (categorical variable).

A look at `read_csv()` messages

We can suppress this message with a couple of options.

sdg <- read_csv(
  file = "data/estat_sdg_08_10_en.csv",
  col_types = list(
    unit = col_factor(),
    TIME_PERIOD = col_integer(),
    `LAST UPDATE` = col_datetime(format = "%d/%m/%y %H:%M:%S")
  )
)

The TIME_PERIOD column is parsed as an integer variable.

A look at `read_csv()` messages

We can suppress this message with a couple of options.

sdg <- read_csv(
  file = "data/estat_sdg_08_10_en.csv",
  col_types = list(
    unit = col_factor(),
    TIME_PERIOD = col_integer(),
    `LAST UPDATE` = col_datetime(format = "%d/%m/%y %H:%M:%S")
  )
)

The LAST UPDATE column is parsed as a date-time variable.

3 Importing files with other formats

The readr package provides a family of functions to import data from various file formats.
Their calling conventions are similar to read_csv().

3.1 Comma-separated data formats

Function	Data Format	Description
`read_csv()`	CSV	Reads comma-separated values
`read_csv2()`	CSV	Reads semicolon-separated (;) values
`read_tsv()`	TSV	Reads tab-separated values
`read_delim()`	Delimited Separated Values	Reads values separated by a custom delimiter

3.2 Other data formats

Function	Data Format	Description
`read_fwf()`	Fixed Width Format	Reads fixed-width formatted values
`read_table()`	Table	Special case of fixed-width format
`read_log()`	Log files	Reads log files

1 Back at the sources

1.1 A look at the CSV files

A look at the CSV files

A look at the CSV files

A look at the CSV files

A look at the CSV files

A look at the CSV files

2 Importing CSV files

2.1 Importing data with read_csv()

2.2 Relative paths

2.3 Absolute paths

2.4 Web addresses

2.5 Reading CSV data

Reading CSV data

Reading CSV data

2.6 A look at read_csv() messages

A look at read_csv() messages

A look at read_csv() messages

A look at read_csv() messages

A look at read_csv() messages

A look at read_csv() messages

A look at read_csv() messages

A look at read_csv() messages

A look at read_csv() messages

A look at read_csv() messages

3 Importing files with other formats

3.1 Comma-separated data formats

3.2 Other data formats

2.1 Importing data with `read_csv()`

2.6 A look at `read_csv()` messages

A look at `read_csv()` messages

A look at `read_csv()` messages

A look at `read_csv()` messages

A look at `read_csv()` messages

A look at `read_csv()` messages

A look at `read_csv()` messages

A look at `read_csv()` messages

A look at `read_csv()` messages

A look at `read_csv()` messages