Text Update: 09/29, 2019 (JST)

　tidyrパッケージがアップデートされ1.0.0となり、パッケージ・デスクリプションのタイトルが「Easily Tidy Data with ‘spread()’ and ‘gather()’ Functions」から「Tidy Messy Data」に変わりました。最もインパクトがあると思われる変更点はタイトルの変更からも推測できるようにspread関数とgather関数がディスコンになったことです。そこで、今回はspread関数とgather関数の後継関数の使い方について簡単に説明します。

Packages and Datasets

　本ページではR version 3.6.1 (2019-07-05)の標準パッケージ以外に以下の追加パッケージを用いています。
　

Package	Version	Description
knitr	1.25	A General-Purpose Package for Dynamic Report Generation in R
tidyr	1.0.0	Tidy Messy Data
tidyselect	0.2.5	Select from a Set of Strings
tidyverse	1.2.1	Easily Install and Load the ‘Tidyverse’

　
　また、本ページでは以下のデータセットを用いています。
　

Dataset	Package	Version	Description
anscombe	datasets	3.6.1	Anscombe’s Quartet of ‘Identical’ Simple Linear Regressions
iris	datasets	3.6.1	Edgar Anderson’s Iris Data

tidyr

　tidyrパッケージは、dplyrパッケージと共にtidyverseファミリーの中核をなすパッケージで、tidy data（整然データ）を扱う際には必須とも言えるパッケージです。バージョンアップに伴う主な変更点は以下の4点です（ tidyr 1.0.0 から引用し、強調を追加）。
　

New pivot_longer() and pivot_wider() provide improved tools for reshaping, superceding spread() and gather(). The new functions are substantially more powerful, thanks to ideas from the data.table and cdata packages, and I’m confident that you’ll find them easier to use and remember than their predecessors.
New unnest_auto(), unnest_longer(), unnest_wider(), and hoist() provide new tools for rectangling, converting deeply nested lists into tidy data frames.
nest() and unnest() have been changed to match an emerging principle for the design of … interfaces. Four new functions (pack()/unpack(), and chop()/unchop()) reveal that nesting is the combination of two simpler steps.
New expand_grid(), a variant of base::expand.grid(). This is a useful function to know about, but also serves as a good reason to discuss the important role that vctrs plays behind the scenes. You shouldn’t ever have to learn about vctrs, but it brings improvements to consistency and performance.

　
　最もインパクトがありそうなのは、一番目のspread関数がpivot_wider関数にgather関数がpivot_longer関数に置き換わる点です。新しくなった関数をどのように使えばよいのかを順に見ていきましょう。

`pivot_longer`関数

　tidyr::pivot_longer関数はtidyr::gather関数に置き換わる関数です。まずは、irisデータセットを使って具体的な使い方を説明します。

iris

## # A tibble: 150 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## # … with 140 more rows

旧来の方法

　先ほどのirisデータセットを品種（Species変数）を除いて、萼片（Sepal）と花弁（Petal）を一つの変数にまとめてみます。旧来のtidyr::gather関数では以下のように行います。

iris %>% 
  tidyr::gather(key = "key", value = "value", -Species)

## # A tibble: 600 x 3
##    Species key          value
##    <fct>   <chr>        <dbl>
##  1 setosa  Sepal.Length   5.1
##  2 setosa  Sepal.Length   4.9
##  3 setosa  Sepal.Length   4.7
##  4 setosa  Sepal.Length   4.6
##  5 setosa  Sepal.Length   5  
##  6 setosa  Sepal.Length   5.4
##  7 setosa  Sepal.Length   4.6
##  8 setosa  Sepal.Length   5  
##  9 setosa  Sepal.Length   4.4
## 10 setosa  Sepal.Length   4.9
## # … with 590 more rows

新しい方法

　新しいtidyr::pivot_longer関数では、以下のように指定します。

iris %>% 
  tidyr::pivot_longer(cols = -Species, names_to = "key", values_to = "value")

## # A tibble: 600 x 3
##    Species key          value
##    <fct>   <chr>        <dbl>
##  1 setosa  Sepal.Length   5.1
##  2 setosa  Sepal.Width    3.5
##  3 setosa  Petal.Length   1.4
##  4 setosa  Petal.Width    0.2
##  5 setosa  Sepal.Length   4.9
##  6 setosa  Sepal.Width    3  
##  7 setosa  Petal.Length   1.4
##  8 setosa  Petal.Width    0.2
##  9 setosa  Sepal.Length   4.7
## 10 setosa  Sepal.Width    3.2
## # … with 590 more rows

旧新比較

　何が変わったかを具体的に見てみましょう。

iris %>% 
   tidyr::gather(key = "key", value = "value", -Species)

iris %>% 
   tidyr::pivot_longer(cols = -Species, names_to = "key", values_to = "value")

　最も変わった点は、ロング型にまとめる変数の指定方法です。tidyr::gather関数では、パラメータの最後に列挙する形で指定しますがtidyr::pivot_longer関数ではcolsパラメータに指定します。
　colsパラメータには変数名を列挙するだけでなく、tidyselectパッケージに定義されている補助関数群（select helpers）を利用することができます。補助関数群（select helpers）については後述します。
　
　tidyr::pivot_longer関数とtidyr::gather関数のパラメータを比較すると下表のようになります。

pivot_longer	gather	説明
cols	NA	まとめたい変数名を指定します
names_to	key	まとめ先の変数名を指定します（省略可）
values_to	values	各変数の値が入る変数名を指定します（省略可）

　このような基本的な使い方ではパラメータの指定方法に大差がないことがわかります。

応用例

　少し応用して、萼片と花弁の変数名をパーツ（part）と測定方向（measure）に分けてまとめてみます。

iris %>% 
   tidyr::pivot_longer(cols = -Species, names_to = c("part", "measure"),
                       names_sep = "\\.", values_to = "value")

## # A tibble: 600 x 4
##    Species part  measure value
##    <fct>   <chr> <chr>   <dbl>
##  1 setosa  Sepal Length    5.1
##  2 setosa  Sepal Width     3.5
##  3 setosa  Petal Length    1.4
##  4 setosa  Petal Width     0.2
##  5 setosa  Sepal Length    4.9
##  6 setosa  Sepal Width     3  
##  7 setosa  Petal Length    1.4
##  8 setosa  Petal Width     0.2
##  9 setosa  Sepal Length    4.7
## 10 setosa  Sepal Width     3.2
## # … with 590 more rows

　ポイントはnames_toパラメータで、まとめ先の変数名を分割数だけ指定する点とnames_sepパラメータで元の変数名を分割するためのセパレータ（.）を正規表現で指定する点です。セパレータで分割できない場合は、分割パターンを正規表現で指定する方法も取れます。

iris %>% 
   tidyr::pivot_longer(cols = -Species, names_to = c("parts", "measure"),
                       names_pattern = "(.+)\\.(.+)", values_to = "value")

## # A tibble: 600 x 4
##    Species parts measure value
##    <fct>   <chr> <chr>   <dbl>
##  1 setosa  Sepal Length    5.1
##  2 setosa  Sepal Width     3.5
##  3 setosa  Petal Length    1.4
##  4 setosa  Petal Width     0.2
##  5 setosa  Sepal Length    4.9
##  6 setosa  Sepal Width     3  
##  7 setosa  Petal Length    1.4
##  8 setosa  Petal Width     0.2
##  9 setosa  Sepal Length    4.7
## 10 setosa  Sepal Width     3.2
## # … with 590 more rows

　参考までにtidyr::gather関数で同様の処理をするには以下のようにtidyr::separete関数の助けが必要です。

iris %>% 
   tidyr::gather(key = "key", value = "value", -Species) %>% 
   tidyr::separate(col = key, into = c("parts", "measure"), sep = "\\.")

## # A tibble: 600 x 4
##    Species parts measure value
##    <fct>   <chr> <chr>   <dbl>
##  1 setosa  Sepal Length    5.1
##  2 setosa  Sepal Length    4.9
##  3 setosa  Sepal Length    4.7
##  4 setosa  Sepal Length    4.6
##  5 setosa  Sepal Length    5  
##  6 setosa  Sepal Length    5.4
##  7 setosa  Sepal Length    4.6
##  8 setosa  Sepal Length    5  
##  9 setosa  Sepal Length    4.4
## 10 setosa  Sepal Length    4.9
## # … with 590 more rows

　ここまで出てきたパラメータを再整理しておきます。

pivot_longer	gather	説明
cols	NA	まとめたい変数名を指定します
names_to	key	まとめ先の変数名を指定します（省略可）
NA	values	各変数の値が入る変数名を指定します（省略可）
names_sep	NA	変数名を分割するためのセパレータを正規表現で指定します
names_pattern	NA	変数名を分割するためのパターンを正規表現で指定します

　names_sepパラメータとnames_patternパラメータの指定は排反になります。

応用例（その２）

　アンスコムのデータ例（anscombe）をx, yと数字（group）からなるロング型に変形してみます。

anscombe

## # A tibble: 11 x 8
##       x1    x2    x3    x4    y1    y2    y3    y4
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1    10    10    10     8  8.04  9.14  7.46  6.58
##  2     8     8     8     8  6.95  8.14  6.77  5.76
##  3    13    13    13     8  7.58  8.74 12.7   7.71
##  4     9     9     9     8  8.81  8.77  7.11  8.84
##  5    11    11    11     8  8.33  9.26  7.81  8.47
##  6    14    14    14     8  9.96  8.1   8.84  7.04
##  7     6     6     6     8  7.24  6.13  6.08  5.25
##  8     4     4     4    19  4.26  3.1   5.39 12.5 
##  9    12    12    12     8 10.8   9.13  8.15  5.56
## 10     7     7     7     8  4.82  7.26  6.42  7.91
## 11     5     5     5     8  5.68  4.74  5.73  6.89

　
　この場合は少し特殊な指定が必要になります。

anscombe %>% 
   tidyr::pivot_longer(cols = tidyselect::everything(),
                       names_to = c(".value", "group"), names_pattern = "(.)(.)",
                       values_to = "value")

## # A tibble: 44 x 3
##    group     x     y
##    <chr> <dbl> <dbl>
##  1 1        10  8.04
##  2 2        10  9.14
##  3 3        10  7.46
##  4 4         8  6.58
##  5 1         8  6.95
##  6 2         8  8.14
##  7 3         8  6.77
##  8 4         8  5.76
##  9 1        13  7.58
## 10 2        13  8.74
## # … with 34 more rows

　names_toパラメータの.valueは特別な意味を持った変数名です。
　names_patternパラメータの指定にしたがってx1～y4の変数名は、xと数字、yと数字という形に分割されます。分割されたxとyは前述のirisの例と同様にnames_toで指定された最初の.valueに割り当てられます。
　しかし、.valueという変数にまとめられる訳でなくxとyという変数名として展開されます。これは.valueが特別な意味を持ってる（処理を行う）ためです。
　次に分割された数字はnames_toで指定されたgroupという変数にまとめられます。
　
　names_toパラメータの指定順を逆にしてみると分かりやすいと思います。

anscombe %>% 
   tidyr::pivot_longer(cols = tidyselect::everything(),
                       names_to = c("group", ".value"), names_pattern = "(.)(.)",
                       values_to = "value")

## # A tibble: 22 x 5
##    group   `1`   `2`   `3`   `4`
##    <chr> <dbl> <dbl> <dbl> <dbl>
##  1 x     10    10    10     8   
##  2 y      8.04  9.14  7.46  6.58
##  3 x      8     8     8     8   
##  4 y      6.95  8.14  6.77  5.76
##  5 x     13    13    13     8   
##  6 y      7.58  8.74 12.7   7.71
##  7 x      9     9     9     8   
##  8 y      8.81  8.77  7.11  8.84
##  9 x     11    11    11     8   
## 10 y      8.33  9.26  7.81  8.47
## # … with 12 more rows

　
　参考までに同様の処理はtidyr::spread関数単体では処理できず、以下のような複雑な処理になります。

anscombe %>%
  tibble::rowid_to_column("id") %>%
  tidyr::gather(key = "key", value = "value", -id) %>%
  tidyr::separate(key, c("measure", "group"), 1) %>%
  tidyr::spread(measure, value) %>% 
  dplyr::select(-id) %>% 
  dplyr::arrange(group)

## # A tibble: 44 x 3
##    group     x     y
##    <chr> <dbl> <dbl>
##  1 1        10  8.04
##  2 1         8  6.95
##  3 1        13  7.58
##  4 1         9  8.81
##  5 1        11  8.33
##  6 1        14  9.96
##  7 1         6  7.24
##  8 1         4  4.26
##  9 1        12 10.8 
## 10 1         7  4.82
## # … with 34 more rows

`pivot_wider`関数

　tidyr::pivot_wider関数はtidyr::spread関数に置き換わる関数です。まずは、irisデータセットをロング型にした以下のデータセットを使って具体的な使い方を説明します。

l_iris

## # A tibble: 600 x 4
##       id Species key          value
##    <int> <fct>   <chr>        <dbl>
##  1     1 setosa  Sepal.Length   5.1
##  2     1 setosa  Sepal.Width    3.5
##  3     1 setosa  Petal.Length   1.4
##  4     1 setosa  Petal.Width    0.2
##  5     2 setosa  Sepal.Length   4.9
##  6     2 setosa  Sepal.Width    3  
##  7     2 setosa  Petal.Length   1.4
##  8     2 setosa  Petal.Width    0.2
##  9     3 setosa  Sepal.Length   4.7
## 10     3 setosa  Sepal.Width    3.2
## # … with 590 more rows

　ポイントはワイド型にする際に個々のデータ（インスタンス、行）を識別するための識別情報（この場合はid）を持っている必要がある点です。

旧来の方法

　先ほどのl_irisデータセットのkey変数にある変数を展開してワイド型にしてみます。tidyr::spread関数では以下のように行います。

l_iris %>% 
   tidyr::spread(key = key, value = value)

## # A tibble: 150 x 6
##       id Species Petal.Length Petal.Width Sepal.Length Sepal.Width
##    <int> <fct>          <dbl>       <dbl>        <dbl>       <dbl>
##  1     1 setosa           1.4         0.2          5.1         3.5
##  2     2 setosa           1.4         0.2          4.9         3  
##  3     3 setosa           1.3         0.2          4.7         3.2
##  4     4 setosa           1.5         0.2          4.6         3.1
##  5     5 setosa           1.4         0.2          5           3.6
##  6     6 setosa           1.7         0.4          5.4         3.9
##  7     7 setosa           1.4         0.3          4.6         3.4
##  8     8 setosa           1.5         0.2          5           3.4
##  9     9 setosa           1.4         0.2          4.4         2.9
## 10    10 setosa           1.5         0.1          4.9         3.1
## # … with 140 more rows

新しい方法

　新しいtidyr::pivot_wider関数では以下のようになります。

l_iris %>% 
   tidyr::pivot_wider(id_cols = c(id, Species),
                      names_from = key, values_from = value)

## # A tibble: 150 x 6
##       id Species Sepal.Length Sepal.Width Petal.Length Petal.Width
##    <int> <fct>          <dbl>       <dbl>        <dbl>       <dbl>
##  1     1 setosa           5.1         3.5          1.4         0.2
##  2     2 setosa           4.9         3            1.4         0.2
##  3     3 setosa           4.7         3.2          1.3         0.2
##  4     4 setosa           4.6         3.1          1.5         0.2
##  5     5 setosa           5           3.6          1.4         0.2
##  6     6 setosa           5.4         3.9          1.7         0.4
##  7     7 setosa           4.6         3.4          1.4         0.3
##  8     8 setosa           5           3.4          1.5         0.2
##  9     9 setosa           4.4         2.9          1.4         0.2
## 10    10 setosa           4.9         3.1          1.5         0.1
## # … with 140 more rows

旧新比較

　何が変わったのか、具体的に見てみましょう

l_iris %>% 
   tidyr::spread(key = key, value = value)

l_iris %>% 
   tidyr::pivot_wider(id_cols = c(id, Species),
                      names_from = key, values_from = value)

　
　最も変わった点は、ワイド型に展開する際に必要となる識別情報と展開しない変数を明示的に指定しなければならない点です。tidyr::spread関数ではkeyとvalueに指定した変数のみがクロス展開されていましたが、tidyr::pivot_wider関数では識別情報となる変数や展開しない変数を明示的にid_colsパラメータに指定する必要があります。

　tidyr::pivot_wider関数とtidyr::spread関数のパラメータを比較すると下表のようになります。

pivot_wider	spread	説明
id_cols	NA	インスタンス（行）の識別情報となる変数名を指定します
names_from	key	展開する変数名が入っている変数を指定します
values_from	values	展開する値が入っている変数を指定します

パラメータが変わっていますが、基本的な考え方は同じです。

応用例

　ロング型になっているアンスコムのデータ例をワイド型にしてみます。前述のl_irisのデータセットと同様にインスタンス（行）を一意に識別するための識別情報（id）が必要です。

l_anscombe

## # A tibble: 44 x 4
##       id group     x     y
##    <int> <chr> <dbl> <dbl>
##  1     1 1        10  8.04
##  2     1 2        10  9.14
##  3     1 3        10  7.46
##  4     1 4         8  6.58
##  5     2 1         8  6.95
##  6     2 2         8  8.14
##  7     2 3         8  6.77
##  8     2 4         8  5.76
##  9     3 1        13  7.58
## 10     3 2        13  8.74
## # … with 34 more rows

l_anscombe %>% 
   tidyr::pivot_wider(id_cols = id,
                      names_from = group, values_from = c(x, y), names_sep = "")

## # A tibble: 11 x 9
##       id    x1    x2    x3    x4    y1    y2    y3    y4
##    <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     1    10    10    10     8  8.04  9.14  7.46  6.58
##  2     2     8     8     8     8  6.95  8.14  6.77  5.76
##  3     3    13    13    13     8  7.58  8.74 12.7   7.71
##  4     4     9     9     9     8  8.81  8.77  7.11  8.84
##  5     5    11    11    11     8  8.33  9.26  7.81  8.47
##  6     6    14    14    14     8  9.96  8.1   8.84  7.04
##  7     7     6     6     6     8  7.24  6.13  6.08  5.25
##  8     8     4     4     4    19  4.26  3.1   5.39 12.5 
##  9     9    12    12    12     8 10.8   9.13  8.15  5.56
## 10    10     7     7     7     8  4.82  7.26  6.42  7.91
## 11    11     5     5     5     8  5.68  4.74  5.73  6.89

　展開したい変数が複数ある場合にはvalues_fromパラメータに全ての変数を指定する点です。　

pivot_wider	spread	説明
id_cols	NA	インスタンス（行）の識別情報となる変数名を指定します
names_from	key	展開する変数名が入っている変数を指定します
values_from	values	展開する値が入っている変数を指定します
names_sep	sep	変数名を結合する際のセパレータを指定します

応用例（その２）

　変数名が二つにわかれている以下のデータセットをワイド型に展開してみましょう。

l2_iris

## # A tibble: 600 x 5
##       id Species part  measure value
##    <int> <fct>   <chr> <chr>   <dbl>
##  1     1 setosa  Sepal Length    5.1
##  2     1 setosa  Sepal Width     3.5
##  3     1 setosa  Petal Length    1.4
##  4     1 setosa  Petal Width     0.2
##  5     2 setosa  Sepal Length    4.9
##  6     2 setosa  Sepal Width     3  
##  7     2 setosa  Petal Length    1.4
##  8     2 setosa  Petal Width     0.2
##  9     3 setosa  Sepal Length    4.7
## 10     3 setosa  Sepal Width     3.2
## # … with 590 more rows

　アンスコムのデータ例と同様ですが、この例では変数名が二変数に分かれていますので、names_fromパラメータに結合したい順に変数名を並べます。

l2_iris %>% 
   tidyr::pivot_wider(id_cols = c(id, Species),
                      names_from = c(measure, part), values_from = value,
                      names_sep = ".")

## # A tibble: 150 x 6
##       id Species Length.Sepal Width.Sepal Length.Petal Width.Petal
##    <int> <fct>          <dbl>       <dbl>        <dbl>       <dbl>
##  1     1 setosa           5.1         3.5          1.4         0.2
##  2     2 setosa           4.9         3            1.4         0.2
##  3     3 setosa           4.7         3.2          1.3         0.2
##  4     4 setosa           4.6         3.1          1.5         0.2
##  5     5 setosa           5           3.6          1.4         0.2
##  6     6 setosa           5.4         3.9          1.7         0.4
##  7     7 setosa           4.6         3.4          1.4         0.3
##  8     8 setosa           5           3.4          1.5         0.2
##  9     9 setosa           4.4         2.9          1.4         0.2
## 10    10 setosa           4.9         3.1          1.5         0.1
## # … with 140 more rows

識別情報がない場合

　インスタンス（行）を識別するための情報がない以下のようなデータをワイド型に展開するとどのようになるのでしょうか？

ln_iris

## # A tibble: 600 x 3
##    Species key          value
##    <fct>   <chr>        <dbl>
##  1 setosa  Sepal.Length   5.1
##  2 setosa  Sepal.Width    3.5
##  3 setosa  Petal.Length   1.4
##  4 setosa  Petal.Width    0.2
##  5 setosa  Sepal.Length   4.9
##  6 setosa  Sepal.Width    3  
##  7 setosa  Petal.Length   1.4
##  8 setosa  Petal.Width    0.2
##  9 setosa  Sepal.Length   4.7
## 10 setosa  Sepal.Width    3.2
## # … with 590 more rows

ln_iris %>% 
  tidyr::pivot_wider(names_from = key, values_from = value)

## # A tibble: 3 x 5
##   Species    Sepal.Length Sepal.Width Petal.Length Petal.Width
##   <fct>       <list<dbl>> <list<dbl>>  <list<dbl>> <list<dbl>>
## 1 setosa             [50]        [50]         [50]        [50]
## 2 versicolor         [50]        [50]         [50]        [50]
## 3 virginica          [50]        [50]         [50]        [50]

　リスト型変数を値に持つデータフレーム型に展開されます。リスト型の中身は以下のようにSpeciesで層別された測定値が入っていることが分かります。これは個々の測定値（インスタンス）を識別するための情報がないために対応関係の必要ないリスト型にまとめるためと考えられます。

ln_iris %>% 
  tidyr::pivot_wider(names_from = key, values_from = value) %>% str()

## Classes 'tbl_df', 'tbl' and 'data.frame':    3 obs. of  5 variables:
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 2 3
##  $ Sepal.Length: list<dbl> [1:3] 
##   ..$ : num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##   ..$ : num  7 6.4 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 ...
##   ..$ : num  6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3 6.7 7.2 ...
##   ..@ ptype: num 
##  $ Sepal.Width : list<dbl> [1:3] 
##   ..$ : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##   ..$ : num  3.2 3.2 3.1 2.3 2.8 2.8 3.3 2.4 2.9 2.7 ...
##   ..$ : num  3.3 2.7 3 2.9 3 3 2.5 2.9 2.5 3.6 ...
##   ..@ ptype: num 
##  $ Petal.Length: list<dbl> [1:3] 
##   ..$ : num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##   ..$ : num  4.7 4.5 4.9 4 4.6 4.5 4.7 3.3 4.6 3.9 ...
##   ..$ : num  6 5.1 5.9 5.6 5.8 6.6 4.5 6.3 5.8 6.1 ...
##   ..@ ptype: num 
##  $ Petal.Width : list<dbl> [1:3] 
##   ..$ : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##   ..$ : num  1.4 1.5 1.5 1.3 1.5 1.3 1.6 1 1.3 1.4 ...
##   ..$ : num  2.5 1.9 2.1 1.8 2.2 2.1 1.7 1.8 1.8 2.5 ...
##   ..@ ptype: num

　この形式には一つメリットがあります。層別に統計量を計算したい場合にvalues_fnパラメータを使って計算式を指定すると変数毎・水準毎の計算が簡単にできるのです。

ln_iris %>%  
  tidyr::pivot_wider(names_from = key, values_from = value,
                     values_fn = list(value = mean))

## # A tibble: 3 x 5
##   Species    Sepal.Length Sepal.Width Petal.Length Petal.Width
##   <fct>             <dbl>       <dbl>        <dbl>       <dbl>
## 1 setosa             5.01        3.43         1.46       0.246
## 2 versicolor         5.94        2.77         4.26       1.33 
## 3 virginica          6.59        2.97         5.55       2.03

補助関数群（select helpers）

　補助関数群（select helpers）はdplyrパッケージのselect関数でも使われる変数名を条件指定するための関数群です。補助（helpers）の名の通り関数単体では使えません。関数自体はtidyselectパッケージで定義されていますが、tidyrパッケージやdplyrパッケージにはエイリアスとして組み込まれていますのでtidyselectパッケージを意識しなくても使えます。
　主な補助関数は以下の通りです。

function	descriotopn
starts_with	指定した文字列から始まる変数名を返します
ends_with	指定した文字列で終わる変数名を返します
contains	指定の文字列を含む変数名を返します
everything	全ての変数名を返します
one_of	指定した文字列のどれかに一致する変数名を返します
matches	指定した正規表現に一致する変数名を返します
num_range	「文字+数字」形式に一致する変数名を返します
last_col	最後の変数名を返します

`starts_with`関数

iris %>% 
   tidyr::pivot_longer(cols = -starts_with("Sp"), names_to = "key")

## # A tibble: 600 x 3
##    Species key          value
##    <fct>   <chr>        <dbl>
##  1 setosa  Sepal.Length   5.1
##  2 setosa  Sepal.Width    3.5
##  3 setosa  Petal.Length   1.4
##  4 setosa  Petal.Width    0.2
##  5 setosa  Sepal.Length   4.9
##  6 setosa  Sepal.Width    3  
##  7 setosa  Petal.Length   1.4
##  8 setosa  Petal.Width    0.2
##  9 setosa  Sepal.Length   4.7
## 10 setosa  Sepal.Width    3.2
## # … with 590 more rows

`ends_with`関数

iris %>% 
   tidyr::pivot_longer(cols = -ends_with("es"), names_to = "key")

## # A tibble: 600 x 3
##    Species key          value
##    <fct>   <chr>        <dbl>
##  1 setosa  Sepal.Length   5.1
##  2 setosa  Sepal.Width    3.5
##  3 setosa  Petal.Length   1.4
##  4 setosa  Petal.Width    0.2
##  5 setosa  Sepal.Length   4.9
##  6 setosa  Sepal.Width    3  
##  7 setosa  Petal.Length   1.4
##  8 setosa  Petal.Width    0.2
##  9 setosa  Sepal.Length   4.7
## 10 setosa  Sepal.Width    3.2
## # … with 590 more rows

`everything`関数

anscombe %>% 
   tidyr::pivot_longer(cols = everything(),
                       names_to = c(".value", "group"), names_pattern = "(.)(.)")

## # A tibble: 44 x 3
##    group     x     y
##    <chr> <dbl> <dbl>
##  1 1        10  8.04
##  2 2        10  9.14
##  3 3        10  7.46
##  4 4         8  6.58
##  5 1         8  6.95
##  6 2         8  8.14
##  7 3         8  6.77
##  8 4         8  5.76
##  9 1        13  7.58
## 10 2        13  8.74
## # … with 34 more rows

`ont_of`関数

iris %>% 
   tidyr::pivot_longer(cols = one_of("Sepal.Length", "Sepal.Width",
                                     "Petal.Length", "Petal.Width"),
                       names_to = "key")

## # A tibble: 600 x 3
##    Species key          value
##    <fct>   <chr>        <dbl>
##  1 setosa  Sepal.Length   5.1
##  2 setosa  Sepal.Width    3.5
##  3 setosa  Petal.Length   1.4
##  4 setosa  Petal.Width    0.2
##  5 setosa  Sepal.Length   4.9
##  6 setosa  Sepal.Width    3  
##  7 setosa  Petal.Length   1.4
##  8 setosa  Petal.Width    0.2
##  9 setosa  Sepal.Length   4.7
## 10 setosa  Sepal.Width    3.2
## # … with 590 more rows

`num_range`関数

anscombe %>% 
   dplyr::select(-num_range("y", 1:4)) %>% 
   tidyr::pivot_longer(cols = num_range("x", 1:4),
                       names_to = c(".value", "group"), names_pattern = "(.)(.)")

## # A tibble: 44 x 2
##    group     x
##    <chr> <dbl>
##  1 1        10
##  2 2        10
##  3 3        10
##  4 4         8
##  5 1         8
##  6 2         8
##  7 3         8
##  8 4         8
##  9 1        13
## 10 2        13
## # … with 34 more rows

`last_col`関数

iris %>% 
   tidyr::pivot_longer(cols = -last_col(), names_to = "key")

## # A tibble: 600 x 3
##    Species key          value
##    <fct>   <chr>        <dbl>
##  1 setosa  Sepal.Length   5.1
##  2 setosa  Sepal.Width    3.5
##  3 setosa  Petal.Length   1.4
##  4 setosa  Petal.Width    0.2
##  5 setosa  Sepal.Length   4.9
##  6 setosa  Sepal.Width    3  
##  7 setosa  Petal.Length   1.4
##  8 setosa  Petal.Width    0.2
##  9 setosa  Sepal.Length   4.7
## 10 setosa  Sepal.Width    3.2
## # … with 590 more rows

参考資料

　
Enjoy! 　

本blogに対するアドバイス、ご指摘等はデータ分析勉強会または GitHub まで。

CC BY-NC-SA 4.0 , Sampo Suzuki

Project Cabinet Blog

Packages and Datasets

tidyr

pivot_longer関数

旧来の方法

新しい方法

旧新比較

応用例

応用例（その２）

pivot_wider関数

旧来の方法

新しい方法

旧新比較

応用例

応用例（その２）

識別情報がない場合

補助関数群（select helpers）

starts_with関数

ends_with関数

everything関数

ont_of関数

num_range関数

last_col関数

参考資料

`pivot_longer`関数

`pivot_wider`関数

`starts_with`関数

`ends_with`関数

`everything`関数

`ont_of`関数

`num_range`関数

`last_col`関数