決定木を描く(C50 package)
Text Update: 09/18, 2019 (JST)

C50パッケージは、C5.0アルゴリズムを用いた決定木モデルや分類ルールモデルを作成できる便利なパッケージですが、現状、CRANに登録されているC50 v0.1.2にバグがあるためプロットできない決定木モデルがあります。

Packages and Datasets

本ページではR version 3.6.1 (2019-07-05)の標準パッケージ以外に以下の追加パッケージを用いています。
 

Package Version Description
C50 0.1.2 C5.0 Decision Trees and Rule-Based Models, GitHub version
knitr 1.24 A General-Purpose Package for Dynamic Report Generation in R
partykit 1.2.5 A Toolkit for Recursive Partytioning
tidyverse 1.2.1 Easily Install and Load the ‘Tidyverse’

 
また、本ページでは以下のデータセットを用いています。
 

Dataset Package Version Description
credit N/A N/A Statlog (German Credit Data) Data Set, UCI ML Repository

 

C50パッケージの問題点

CRANに登録されいてるC50パッケージには Error in plot and then plot() does not match summary() で報告されているようなバグがあり、プロットできない決定木モデルがあります。 このバグは GitHub topepo/C5.0 で公開されている開発版で修正されていますので、C5.0アルゴリズムによる決定木モデルをプロットしたい場合は開発版であるGitHub版を利用してください。なお、開発版ですので他のバグが潜んでいる可能性がありますので、その点に留意して使用してください。

 

GitHub版のインストール

GitHubからインストールするためにはdevtoolsパッケージを用います。

devtools::install_github("topepo/C5.0")

 
なお、日本語版Windows環境ではエラーが出てインストールできませんので、他のプラットフォームを利用するか Docker Container Image を利用してください。

 

決定木モデルのプロット

GitHub版のC50パッケージを利用して『Rによる機械学習』第5章で利用しているクレジットデータ(原書著作者のGitHubから直接読み込み)から決定木モデルを作成してプロットしてみます。

credit_model <- read.csv ("https://raw.githubusercontent.com/dataspelunking/MLwR/master/Machine%20Learning%20with%20R%20(2nd%20Ed.)/Chapter%2005/credit.csv", stringsAsFactors = TRUE) %>% 
  C50::C5.0(default ~ ., data = .)

credit_model %>% 
  summary()
## 
## Call:
## C5.0.formula(formula = default ~ ., data = .)
## 
## 
## C5.0 [Release 2.07 GPL Edition]      Wed Sep 18 21:46:21 2019
## -------------------------------
## 
## Class specified by attribute `outcome'
## 
## Read 1000 cases (17 attributes) from undefined.data
## 
## Decision tree:
## 
## checking_balance in {> 200 DM,unknown}: no (457/60)
## checking_balance in {< 0 DM,1 - 200 DM}:
## :...credit_history in {perfect,very good}:
##     :...housing in {other,rent}:
##     :   :...other_credit in {none,store}: yes (23/1)
##     :   :   other_credit = bank:
##     :   :   :...percent_of_income <= 1: no (2)
##     :   :       percent_of_income > 1: yes (7/1)
##     :   housing = own:
##     :   :...savings_balance = < 100 DM: yes (20/7)
##     :       savings_balance in {> 1000 DM,500 - 1000 DM}: no (4)
##     :       savings_balance = 100 - 500 DM:
##     :       :...months_loan_duration <= 18: yes (4)
##     :       :   months_loan_duration > 18: no (3)
##     :       savings_balance = unknown:
##     :       :...months_loan_duration <= 27: yes (2)
##     :           months_loan_duration > 27: no (3)
##     credit_history in {critical,good,poor}:
##     :...months_loan_duration <= 22:
##         :...purpose = car0: no (2)
##         :   purpose = business:
##         :   :...months_loan_duration <= 18: no (16)
##         :   :   months_loan_duration > 18: yes (3/1)
##         :   purpose = education:
##         :   :...savings_balance in {< 100 DM,> 1000 DM,100 - 500 DM,
##         :   :   :                   500 - 1000 DM}: yes (9/1)
##         :   :   savings_balance = unknown: no (5)
##         :   purpose = renovations:
##         :   :...other_credit = bank: no (0)
##         :   :   other_credit = store: yes (1)
##         :   :   other_credit = none:
##         :   :   :...housing = other: yes (1)
##         :   :       housing in {own,rent}: no (8/1)
##         :   purpose = car:
##         :   :...other_credit = store: no (0)
##         :   :   other_credit = bank: yes (13/4)
##         :   :   other_credit = none:
##         :   :   :...credit_history in {critical,poor}: no (34/4)
##         :   :       credit_history = good:
##         :   :       :...savings_balance in {> 1000 DM,500 - 1000 DM,
##         :   :           :                   unknown}: no (15/4)
##         :   :           savings_balance = 100 - 500 DM:
##         :   :           :...housing in {other,own}: no (4)
##         :   :           :   housing = rent: yes (1)
##         :   :           savings_balance = < 100 DM:
##         :   :           :...years_at_residence <= 2: yes (10/1)
##         :   :               years_at_residence > 2:
##         :   :               :...job = management: yes (2)
##         :   :                   job in {skilled,unemployed,
##         :   :                           unskilled}: no (15/4)
##         :   purpose = furniture/appliances:
##         :   :...savings_balance = 100 - 500 DM: yes (4)
##         :       savings_balance in {< 100 DM,> 1000 DM,500 - 1000 DM,unknown}:
##         :       :...employment_duration = 4 - 7 years: no (20)
##         :           employment_duration in {< 1 year,> 7 years,1 - 4 years,
##         :           :                       unemployed}:
##         :           :...months_loan_duration > 16:
##         :               :...employment_duration in {< 1 year,
##         :               :   :                       > 7 years}: no (15/3)
##         :               :   employment_duration in {1 - 4 years,unemployed}:
##         :               :   :...savings_balance in {500 - 1000 DM,
##         :               :       :                   unknown}: yes (0)
##         :               :       savings_balance = > 1000 DM: no (1)
##         :               :       savings_balance = < 100 DM:
##         :               :       :...checking_balance = < 0 DM: yes (12)
##         :               :           checking_balance = 1 - 200 DM: no (4/1)
##         :               months_loan_duration <= 16:
##         :               :...existing_loans_count > 1: no (19/1)
##         :                   existing_loans_count <= 1:
##         :                   :...other_credit = store: yes (2)
##         :                       other_credit in {bank,none}:
##         :                       :...phone = no: no (49/10)
##         :                           phone = yes:
##         :                           :...amount <= 1424: no (5)
##         :                               amount > 1424: yes (8/2)
##         months_loan_duration > 22:
##         :...savings_balance = 500 - 1000 DM: yes (4/1)
##             savings_balance = > 1000 DM:
##             :...employment_duration in {< 1 year,> 7 years,1 - 4 years,
##             :   :                       unemployed}: no (3)
##             :   employment_duration = 4 - 7 years: yes (1)
##             savings_balance = 100 - 500 DM:
##             :...credit_history in {critical,poor}: no (14/3)
##             :   credit_history = good:
##             :   :...other_credit = bank: no (1)
##             :       other_credit in {none,store}: yes (13/2)
##             savings_balance = unknown:
##             :...checking_balance = 1 - 200 DM: no (18/1)
##             :   checking_balance = < 0 DM:
##             :   :...credit_history = critical: no (1)
##             :       credit_history in {good,poor}: yes (12/3)
##             savings_balance = < 100 DM:
##             :...months_loan_duration > 47: yes (21/2)
##                 months_loan_duration <= 47:
##                 :...housing = other:
##                     :...percent_of_income <= 2: no (7)
##                     :   percent_of_income > 2: yes (9/3)
##                     housing = rent:
##                     :...other_credit = store: yes (0)
##                     :   other_credit = bank: no (1)
##                     :   other_credit = none:
##                     :   :...years_at_residence > 3: yes (11/1)
##                     :       years_at_residence <= 3:
##                     :       :...percent_of_income <= 3: no (3)
##                     :           percent_of_income > 3: yes (3)
##                     housing = own:
##                     :...employment_duration = > 7 years: no (14/5)
##                         employment_duration = 4 - 7 years:
##                         :...job in {management,skilled,
##                         :   :       unemployed}: yes (10/1)
##                         :   job = unskilled: no (1)
##                         employment_duration = unemployed:
##                         :...years_at_residence <= 2: yes (4)
##                         :   years_at_residence > 2: no (3)
##                         employment_duration = 1 - 4 years:
##                         :...years_at_residence > 3: no (5)
##                         :   years_at_residence <= 3:
##                         :   :...purpose in {business,car,car0,education,
##                         :       :           renovations}: yes (11/1)
##                         :       purpose = furniture/appliances: no (5)
##                         employment_duration = < 1 year:
##                         :...years_at_residence > 3: yes (7)
##                             years_at_residence <= 3:
##                             :...other_credit = bank: no (0)
##                                 other_credit = store: yes (1)
##                                 other_credit = none:
##                                 :...checking_balance = 1 - 200 DM: no (8/2)
##                                     checking_balance = < 0 DM:
##                                     :...job in {management,skilled,
##                                         :       unemployed}: yes (3)
##                                         job = unskilled: no (3/1)
## 
## 
## Evaluation on training data (1000 cases):
## 
##      Decision Tree   
##    ----------------  
##    Size      Errors  
## 
##      66  132(13.2%)   <<
## 
## 
##     (a)   (b)    <-classified as
##    ----  ----
##     668    32    (a): class no
##     100   200    (b): class yes
## 
## 
##  Attribute usage:
## 
##  100.00% checking_balance
##   54.30% credit_history
##   48.70% months_loan_duration
##   43.30% savings_balance
##   29.40% purpose
##   24.70% other_credit
##   21.40% employment_duration
##   19.10% housing
##    9.40% years_at_residence
##    8.30% existing_loans_count
##    6.20% phone
##    3.40% job
##    3.10% percent_of_income
##    1.30% amount
## 
## 
## Time: 0.0 secs
credit_model %>% 
  plot()

このように巨大な決定木モデルがプロットできました。拡大したい場合は右クリックメニューから「画像を開く」を実行して画像のみを表示すれば拡大できるようになります。なお、プロットにはpartykitパッケージが使われています。

 
Enjoy!  

本blogに対するアドバイス、ご指摘等は データ分析勉強会 または GitHub まで。

CC BY-NC-SA 4.0 , Sampo Suzuki