C50
パッケージは、C5.0アルゴリズムを用いた決定木モデルや分類ルールモデルを作成できる便利なパッケージですが、現状、CRANに登録されているC50 v0.1.2
にバグがあるためプロットできない決定木モデルがあります。
Packages and Datasets
本ページではR version 3.6.1 (2019-07-05)の標準パッケージ以外に以下の追加パッケージを用いています。
Package | Version | Description |
---|---|---|
C50 | 0.1.2 | C5.0 Decision Trees and Rule-Based Models, GitHub version |
knitr | 1.24 | A General-Purpose Package for Dynamic Report Generation in R |
partykit | 1.2.5 | A Toolkit for Recursive Partytioning |
tidyverse | 1.2.1 | Easily Install and Load the ‘Tidyverse’ |
また、本ページでは以下のデータセットを用いています。
Dataset | Package | Version | Description |
---|---|---|---|
credit | N/A | N/A | Statlog (German Credit Data) Data Set, UCI ML Repository |
C50
パッケージの問題点
CRANに登録されいてるC50
パッケージには Error in plot and then plot() does not match summary() で報告されているようなバグがあり、プロットできない決定木モデルがあります。 このバグは GitHub topepo/C5.0 で公開されている開発版で修正されていますので、C5.0アルゴリズムによる決定木モデルをプロットしたい場合は開発版であるGitHub版を利用してください。なお、開発版ですので他のバグが潜んでいる可能性がありますので、その点に留意して使用してください。
GitHub版のインストール
GitHubからインストールするためにはdevtools
パッケージを用います。
devtools::install_github("topepo/C5.0")
なお、日本語版Windows環境ではエラーが出てインストールできませんので、他のプラットフォームを利用するか Docker Container Image を利用してください。
決定木モデルのプロット
GitHub版のC50
パッケージを利用して『Rによる機械学習』第5章で利用しているクレジットデータ(原書著作者のGitHubから直接読み込み)から決定木モデルを作成してプロットしてみます。
credit_model <- read.csv ("https://raw.githubusercontent.com/dataspelunking/MLwR/master/Machine%20Learning%20with%20R%20(2nd%20Ed.)/Chapter%2005/credit.csv", stringsAsFactors = TRUE) %>%
C50::C5.0(default ~ ., data = .)
credit_model %>%
summary()
##
## Call:
## C5.0.formula(formula = default ~ ., data = .)
##
##
## C5.0 [Release 2.07 GPL Edition] Wed Sep 18 21:46:21 2019
## -------------------------------
##
## Class specified by attribute `outcome'
##
## Read 1000 cases (17 attributes) from undefined.data
##
## Decision tree:
##
## checking_balance in {> 200 DM,unknown}: no (457/60)
## checking_balance in {< 0 DM,1 - 200 DM}:
## :...credit_history in {perfect,very good}:
## :...housing in {other,rent}:
## : :...other_credit in {none,store}: yes (23/1)
## : : other_credit = bank:
## : : :...percent_of_income <= 1: no (2)
## : : percent_of_income > 1: yes (7/1)
## : housing = own:
## : :...savings_balance = < 100 DM: yes (20/7)
## : savings_balance in {> 1000 DM,500 - 1000 DM}: no (4)
## : savings_balance = 100 - 500 DM:
## : :...months_loan_duration <= 18: yes (4)
## : : months_loan_duration > 18: no (3)
## : savings_balance = unknown:
## : :...months_loan_duration <= 27: yes (2)
## : months_loan_duration > 27: no (3)
## credit_history in {critical,good,poor}:
## :...months_loan_duration <= 22:
## :...purpose = car0: no (2)
## : purpose = business:
## : :...months_loan_duration <= 18: no (16)
## : : months_loan_duration > 18: yes (3/1)
## : purpose = education:
## : :...savings_balance in {< 100 DM,> 1000 DM,100 - 500 DM,
## : : : 500 - 1000 DM}: yes (9/1)
## : : savings_balance = unknown: no (5)
## : purpose = renovations:
## : :...other_credit = bank: no (0)
## : : other_credit = store: yes (1)
## : : other_credit = none:
## : : :...housing = other: yes (1)
## : : housing in {own,rent}: no (8/1)
## : purpose = car:
## : :...other_credit = store: no (0)
## : : other_credit = bank: yes (13/4)
## : : other_credit = none:
## : : :...credit_history in {critical,poor}: no (34/4)
## : : credit_history = good:
## : : :...savings_balance in {> 1000 DM,500 - 1000 DM,
## : : : unknown}: no (15/4)
## : : savings_balance = 100 - 500 DM:
## : : :...housing in {other,own}: no (4)
## : : : housing = rent: yes (1)
## : : savings_balance = < 100 DM:
## : : :...years_at_residence <= 2: yes (10/1)
## : : years_at_residence > 2:
## : : :...job = management: yes (2)
## : : job in {skilled,unemployed,
## : : unskilled}: no (15/4)
## : purpose = furniture/appliances:
## : :...savings_balance = 100 - 500 DM: yes (4)
## : savings_balance in {< 100 DM,> 1000 DM,500 - 1000 DM,unknown}:
## : :...employment_duration = 4 - 7 years: no (20)
## : employment_duration in {< 1 year,> 7 years,1 - 4 years,
## : : unemployed}:
## : :...months_loan_duration > 16:
## : :...employment_duration in {< 1 year,
## : : : > 7 years}: no (15/3)
## : : employment_duration in {1 - 4 years,unemployed}:
## : : :...savings_balance in {500 - 1000 DM,
## : : : unknown}: yes (0)
## : : savings_balance = > 1000 DM: no (1)
## : : savings_balance = < 100 DM:
## : : :...checking_balance = < 0 DM: yes (12)
## : : checking_balance = 1 - 200 DM: no (4/1)
## : months_loan_duration <= 16:
## : :...existing_loans_count > 1: no (19/1)
## : existing_loans_count <= 1:
## : :...other_credit = store: yes (2)
## : other_credit in {bank,none}:
## : :...phone = no: no (49/10)
## : phone = yes:
## : :...amount <= 1424: no (5)
## : amount > 1424: yes (8/2)
## months_loan_duration > 22:
## :...savings_balance = 500 - 1000 DM: yes (4/1)
## savings_balance = > 1000 DM:
## :...employment_duration in {< 1 year,> 7 years,1 - 4 years,
## : : unemployed}: no (3)
## : employment_duration = 4 - 7 years: yes (1)
## savings_balance = 100 - 500 DM:
## :...credit_history in {critical,poor}: no (14/3)
## : credit_history = good:
## : :...other_credit = bank: no (1)
## : other_credit in {none,store}: yes (13/2)
## savings_balance = unknown:
## :...checking_balance = 1 - 200 DM: no (18/1)
## : checking_balance = < 0 DM:
## : :...credit_history = critical: no (1)
## : credit_history in {good,poor}: yes (12/3)
## savings_balance = < 100 DM:
## :...months_loan_duration > 47: yes (21/2)
## months_loan_duration <= 47:
## :...housing = other:
## :...percent_of_income <= 2: no (7)
## : percent_of_income > 2: yes (9/3)
## housing = rent:
## :...other_credit = store: yes (0)
## : other_credit = bank: no (1)
## : other_credit = none:
## : :...years_at_residence > 3: yes (11/1)
## : years_at_residence <= 3:
## : :...percent_of_income <= 3: no (3)
## : percent_of_income > 3: yes (3)
## housing = own:
## :...employment_duration = > 7 years: no (14/5)
## employment_duration = 4 - 7 years:
## :...job in {management,skilled,
## : : unemployed}: yes (10/1)
## : job = unskilled: no (1)
## employment_duration = unemployed:
## :...years_at_residence <= 2: yes (4)
## : years_at_residence > 2: no (3)
## employment_duration = 1 - 4 years:
## :...years_at_residence > 3: no (5)
## : years_at_residence <= 3:
## : :...purpose in {business,car,car0,education,
## : : renovations}: yes (11/1)
## : purpose = furniture/appliances: no (5)
## employment_duration = < 1 year:
## :...years_at_residence > 3: yes (7)
## years_at_residence <= 3:
## :...other_credit = bank: no (0)
## other_credit = store: yes (1)
## other_credit = none:
## :...checking_balance = 1 - 200 DM: no (8/2)
## checking_balance = < 0 DM:
## :...job in {management,skilled,
## : unemployed}: yes (3)
## job = unskilled: no (3/1)
##
##
## Evaluation on training data (1000 cases):
##
## Decision Tree
## ----------------
## Size Errors
##
## 66 132(13.2%) <<
##
##
## (a) (b) <-classified as
## ---- ----
## 668 32 (a): class no
## 100 200 (b): class yes
##
##
## Attribute usage:
##
## 100.00% checking_balance
## 54.30% credit_history
## 48.70% months_loan_duration
## 43.30% savings_balance
## 29.40% purpose
## 24.70% other_credit
## 21.40% employment_duration
## 19.10% housing
## 9.40% years_at_residence
## 8.30% existing_loans_count
## 6.20% phone
## 3.40% job
## 3.10% percent_of_income
## 1.30% amount
##
##
## Time: 0.0 secs
credit_model %>%
plot()
このように巨大な決定木モデルがプロットできました。拡大したい場合は右クリックメニューから「画像を開く」を実行して画像のみを表示すれば拡大できるようになります。なお、プロットにはpartykit
パッケージが使われています。
Enjoy!