Packages and Datasets

本ページではR version 3.4.4 (2018-03-15)の標準パッケージ以外に以下の追加パッケージを用いています。
　

Package	Version	Description
tidyverse	1.2.1	Easily Install and Load the ‘Tidyverse’

　
また、本ページでは以下のデータセットを用いています。
　

Dataset	Package	Version	Description
issues	NA	NA	Redmine Issues

Redmineのデータを展開する

RedmineのREST API機能でチケットデータを取得すると以下のようにデータフレーム内にリスト型変数を持っています（チケットデータの詳細）。
　

## # A tibble: 17,618 x 20
##       id project tracker status priority author category subject
##    <int> <list>  <list>  <list> <list>   <list> <list>   <chr>  
##  1 29855 <list ~ <list ~ <list~ <list [~ <list~ <list [~ add_wo~
##  2 29853 <list ~ <list ~ <list~ <list [~ <list~ <list [~ "Defau~
##  3 29852 <list ~ <list ~ <list~ <list [~ <list~ <list [~ Cannot~
##  4 29849 <list ~ <list ~ <list~ <list [~ <list~ <list [~ Displa~
##  5 29848 <list ~ <list ~ <list~ <list [~ <list~ <list [~ Query ~
##  6 29840 <list ~ <list ~ <list~ <list [~ <list~ <list [~ Cant u~
##  7 29838 <list ~ <list ~ <list~ <list [~ <list~ <list [~ time l~
##  8 29830 <list ~ <list ~ <list~ <list [~ <list~ <list [~ Issue ~
##  9 29826 <list ~ <list ~ <list~ <list [~ <list~ <list [~ Can't ~
## 10 29824 <list ~ <list ~ <list~ <list [~ <list~ <list [~ Add us~
## # ... with 17,608 more rows, and 12 more variables: description <chr>,
## #   done_ratio <int>, custom_fields <list>, created_on <chr>,
## #   updated_on <chr>, closed_on <chr>, fixed_version <list>,
## #   assigned_to <list>, start_date <chr>, due_date <chr>,
## #   estimated_hours <dbl>, parent <list>

　
これらのリスト型変数を展開するにはpurrrパッケージが便利です。
　

issues %>% 
  dplyr::select(id, project, tracker, status, priority) %>% 
  dplyr::mutate(project = purrr::map_chr(project, 'name'),
                tracker = purrr::map_chr(tracker, 'name'),
                status = purrr::map_chr(status, 'name'),
                priority = purrr::map_chr(priority, 'name'))

## # A tibble: 17,618 x 5
##       id project tracker status    priority
##    <int> <chr>   <chr>   <chr>     <chr>   
##  1 29855 Redmine Defect  Confirmed Normal  
##  2 29853 Redmine Defect  New       Normal  
##  3 29852 Redmine Defect  Closed    Normal  
##  4 29849 Redmine Defect  New       Normal  
##  5 29848 Redmine Defect  New       Normal  
##  6 29840 Redmine Defect  Closed    Normal  
##  7 29838 Redmine Patch   New       Normal  
##  8 29830 Redmine Patch   New       Normal  
##  9 29826 Redmine Defect  New       Normal  
## 10 29824 Redmine Feature New       Normal  
## # ... with 17,608 more rows

　
purrr::map関数群は第一引数（.x）に対して第二引数.fで指定した処理（関数や演算子など）を適用する関数です。この処理の場合、.fにリストの名前（nameというリスト変数内の項目名）を指定していますのでインデックス参照¹が行われています。すなわち、
　

issues$project[[n]]$name

　
という参照を行っているのと等価です（n = 1, 2, ...）。
　
¹ ベクトルやリストの値を参照する時に利用する[や[[、$は、要素アクセス演算子と言える演算子の一種です
　

NAがある場合の処理

ただし、以下のようにリスト型変数にNAがある場合には、
　

issues %>% 
  dplyr::select(id, category, assigned_to) %>% 
  head(15) %>% knitr::kable()

id	category	assigned_to
29855	list(id = 2, name = “Issues”)	NA
29853	list(id = 35, name = “Core Plugins”)	NA
29852	list(id = 26, name = “Text formatting”)	NA
29849	list(id = 58, name = “Issues list”)	NA
29848	list(id = 21, name = “Database”)	NA
29840	list(id = 32, name = “REST API”)	NA
29838	list(id = 13, name = “Time tracking”)	NA
29830	list(id = 2, name = “Issues”)	NA
29826	list(id = 4, name = “Documents”)	NA
29824	list(id = 10, name = “UI”)	NA
29820	list(id = 30, name = “Code cleanup/refactoring”)	list(id = 332, name = “Go MAEDA”)
29819	list(id = 2, name = “Issues”)	NA
29817	list(id = 29, name = “Email receiving”)	NA
29816	list(id = 42, name = “My page”)	NA
29791	list(id = 1, name = “Wiki”)	NA

　
同様の展開を行ってもエラーとなり処理できません。
　

issues %>% 
  dplyr::select(id, category, assigned_to) %>% 
  dplyr::mutate(category = purrr::map_chr(category, 'name'),
                assigned_to = purrr::map_chr(assigned_to, 'name'))

## Error in mutate_impl(.data, dots): Evaluation error: Result 25 is not a length 1 atomic vector.

　
このような場合はNAを除いてpurrr::map_*関数の処理を適用する必要があります。条件により処理を行うにはpurrr::map_if関数を用います。purrr::map_if関数の返り値はリスト型になりますので、後段でベクトル型にする必要があります。
　

issues %>% 
  dplyr::select(id, category, assigned_to) %>% 
  dplyr::mutate(
    tmp1 = purrr::map_if(category, !is.na(category), 'name'),
    tmp2 = purrr::map_chr(tmp1, 1L),
    tmp3 = purrr::map_if(assigned_to, !is.na(assigned_to), 'name'),
    tmp4 = purrr::map_chr(tmp3, 1L))

## # A tibble: 17,618 x 7
##       id category   assigned_to tmp1      tmp2            tmp3      tmp4 
##    <int> <list>     <list>      <list>    <chr>           <list>    <chr>
##  1 29855 <list [2]> <lgl [1]>   <chr [1]> Issues          <lgl [1]> <NA> 
##  2 29853 <list [2]> <lgl [1]>   <chr [1]> Core Plugins    <lgl [1]> <NA> 
##  3 29852 <list [2]> <lgl [1]>   <chr [1]> Text formatting <lgl [1]> <NA> 
##  4 29849 <list [2]> <lgl [1]>   <chr [1]> Issues list     <lgl [1]> <NA> 
##  5 29848 <list [2]> <lgl [1]>   <chr [1]> Database        <lgl [1]> <NA> 
##  6 29840 <list [2]> <lgl [1]>   <chr [1]> REST API        <lgl [1]> <NA> 
##  7 29838 <list [2]> <lgl [1]>   <chr [1]> Time tracking   <lgl [1]> <NA> 
##  8 29830 <list [2]> <lgl [1]>   <chr [1]> Issues          <lgl [1]> <NA> 
##  9 29826 <list [2]> <lgl [1]>   <chr [1]> Documents       <lgl [1]> <NA> 
## 10 29824 <list [2]> <lgl [1]>   <chr [1]> UI              <lgl [1]> <NA> 
## # ... with 17,608 more rows

実際のチケットでは必須入力項目以外にはNAが入る可能性が常にありますので、リスト変数の展開処理の際には常にNAの処理を意識しておいてください。
　

ENJOY!

CC BY-NC-SA 4.0 , Sampo Suzuki

Project Cabinet Blog

Packages and Datasets

Redmineのデータを展開する

NAがある場合の処理