概述
一、使用SAS/EM界面生成CHAID决策树
CHAID (Chi-Square Automatic Interaction
Detector)演算法為利用卡方分析(Chi-Square
Test)預測二個變數是否需要合併,如能夠產生最大的類別差異的預測變數,將成為節點的分隔變數。透過計算節點中類別的 P值
(P-Value),以P值大小來決定決策樹是否繼續生長,所以不需像C4.5或CART要再做決策樹修剪的動作。CHAID
與CART、C4.5
之差異在於,CHAID只限於處理類別變數,如連續變數必須採用區段的方式,轉換成類別變數。另一差異部分在於修剪的部分,CART、C4.5
是先過度套用資料訓練,之後再修剪。但CHAID 則是在過度套用之前即停止支點蔓生擴大。
二、通过R和SAS代码实现
样例数据来自 R 包,主要是想比较 R 和 SAS
生成的决策树是否相同。运行的结果确实不同,估计是运行参数的不同导致,仍有待研究。
1) Build CHAID tree using
R
# Train
data:
library(partykit)
library("CHAID")
data("BreastCancer", package = "mlbench")
# Build model:
ctrl
-1, alpha4 = 0.05,
minsplit=2, minbucket = 5, minprob = 0.01, stump = FALSE, maxheight
= 6)
b_chaid
Cl.thickness + Cell.size + Cell.shape + Marg.adhesion +
Epith.c.size + Bare.nuclei + Bl.cromatin + Normal.nucleoli +
Mitoses,
data = BreastCancer, na.action = na.pass, control =
ctrl)
plot(b_chaid)
2) Build CHAID tree using SAS/EM
SAS/EM Chaid Tree:
SAS Code:
proc iml;
submit /R;
#setInternet2(TRUE)
#install.packages("CHAID", repos="http://R-Forge.R-project.org")
# Train data:
library(partykit)
library("CHAID")
data("BreastCancer", package = "mlbench")
# Build model:
b_chaid
Cell.size + Cell.shape + Marg.adhesion +
Epith.c.size + Bare.nuclei + Bl.cromatin + Normal.nucleoli +
Mitoses,
data = BreastCancer)
png("D:/sbjgay/Chaid_r_plot.png")
plot(b_chaid)
dev.off()
endsubmit;
call ImportDataSetFromR("work.BreastCancer",
"BreastCancer");
run;quit;
filename rulecode "c:tempem_chaid_rules.sas";
*------------------------------------------------------------*;
* Tree: Run ARBOR procedure;
*------------------------------------------------------------*;
proc arbor data=work.BreastCancer
Leafsize=1
Splitsize=2
Mincatsize = 5
Maxbranch=10
Maxdepth=6
Criterion=PROBCHISQ
alpha = 0.05
Padjust= CHAIDAFTER
DEPTH
MAXRULES=5
MAXSURRS=0
Missing=USEINSEARCH
Exhaustive=0
event='malignant'
;
input Cl_thickness Cell_size
Cell_shape Marg_adhesion
Epith_c_size Bare_nuclei
Bl_cromatin Normal_nucleoli Mitoses / level=nominal;
target Class / level=NOMINAL
Criterion=PROBCHISQ;
Performance DISK
NodeSize=20000;
Assess NoValidata
measure=MISC;
SUBTREE LARGEST;
MAKEMACRO
NLEAVES=nleaves;
save
MODEL=Tree_EMTREE
SEQUENCE=Tree_OUTSEQ
IMPORTANCE=Tree_OUTIMPORT
NODESTAT=Tree_OUTNODES
SUMMARY=Tree_OUTSUMMARY
STATSBYNODE=Tree_OUTSTATS
Topology=Tree_OUTTOPOLOGY
Path = Tree_OUTPATH
Rules=Tree_OUTRules
;
code file=rulecode;
run;
quit;
3) Build CHAID tree using
TreeDisc.sas in SAS 9.3
NOTE: Treedisc.sas does not work in 9.4.
%inc 'c:tempchaidxmacro.sas';
%inc 'c:tempchaidtreedisc.sas';
data set2;
set breastcancer;
run;
%treedisc(data=set2, depvar=class, freq=, ordinal=,
nominal=Cl_thickness Cell_size Cell_shape Marg_adhesion
Epith_c_size Bare_nuclei Bl_cromatin Normal_nucleoli Mitoses,
alpha=0.05,
outtree=trd,
options=noformat,
trace=long);
%treedisc(intree=trd, draw=graphics);
NOTE: Trees built by the above 3 methods are
different. Not sure why this
happened.
Reference:
最后
以上就是淡淡绿草为你收集整理的代码chaid_[转载]经典决策树之SAS实现--CHAID的全部内容,希望文章能够帮你解决代码chaid_[转载]经典决策树之SAS实现--CHAID所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复