机器学习基础03——决策树算法

92 阅读 0 评论 61 点赞

我是靠谱客的博主内向康乃馨，这篇文章主要介绍机器学习基础03——决策树算法，现在分享给大家，希望可以做个参考。

决策树算法的基本概念

决策树算法是一种树型结构，包括决策节点（内部节点），分支和叶节点三部分。

决策节点：代表某个测试，通常对应于待分类对象的某个属性（属性结构的有向边）

叶节点：存放某个类标号值，表示一种可能的分类9结果。

分支：表示某个决策节点的不同取值。

决策树是一种典型的分类方法

首先对数据进行处理，利用归纳算法生成可读的规则和决策树，然后使用决策对新数据进行分析。

本质上决策树是通过一系列规则对数据进行分类的过程。

决策树的优点

推理过程容易理解，决策推理过程可以表示成If Then形式；

推理过程完全依赖于属性变量的取值特点；

可自动忽略目标变量没有贡献的属性变量，也为判断属性变量的重要性，减少变量的数目提供参考。

利用决策树算法对红酒集数据进行预测

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#计算深度为1时的训练集准确率
import numpy as np
import matplotlib.pyplot as plt
from sklearn import tree
from sklearn import datasets
from sklearn.model_selection import train_test_split
wine=datasets.load_wine()
X=wine.data
y=wine.target
#将数据分割为测试集和训练集
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=33)
#配置决策树，此处深度设置为1
clf = tree.DecisionTreeClassifier(max_depth = 1)
clf.fit(X_train,y_train)
score_train=round(clf.score(X_train,y_train),2)
print(f'训练集的得分为:{score_train}')

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#计算不同深度的情况下，决策树的准确率
depth=[1,3,5,7,9]
train=[]
test=[]
for i in depth:
    clf = tree.DecisionTreeClassifier(max_depth = i)
    clf.fit(X_train,y_train)
    score_train=round(clf.score(X_train,y_train),2)
    train.append(score_train)
    score_test=round(clf.score(X_test,y_test),2)
    test.append(score_test)
    print(f'决策树的深度为{i}时,训练集的得分为:{score_train},测试集的得分为:{score_test}')
plt.plot(depth,train)
plt.plot(depth,train,marker="x",color="r",label = "training score")
plt.plot(depth,test,marker="o",color="g",label = "testing score")
plt.legend()
plt.show()

复制代码

1
2
3
4
5
#根据上图可知，深度为5时，测试集和训练集的准确率较高，画出深度为5的决策树模型
clf = tree.DecisionTreeClassifier(max_depth = 5)
clf.fit(X_train,y_train)
tree.plot_tree(decision_tree=clf,filled=True)
plt.show()