概述
sklearn随机森林模型:ValueError: Unknown label type: 'unknown'
目录
sklearn随机森林模型:ValueError: Unknown label type: 'unknown'
问题:
解决:
完整错误:
问题:
分类的标签数据被客户错误地标注为了浮点型;
让学生去做出现了问题,有问题的程序如下;
def random_forest_selection(df):
clf = RandomForestClassifier(n_estimators=100,
random_state=42,
class_weight = 'balanced',
min_samples_leaf = 4,
min_samples_split = 4,
max_depth = 5,
n_jobs = -1,)
clf_X = df.drop(['label'], axis=1)
clf_y = df.label
model = clf.fit(clf_X,clf_y)
#feat_importances = pd.DataFrame(model.feature_importances_, index=clf_X.columns, columns=["importance"])
feature_importances = pd.DataFrame({'feature': clf_X.columns, 'importance': model.feature_importances_})
feature_importances.sort_values(by='importance', ascending=False, inplace=True)
#feat_importances[:10].plot(figsize(12,5),kind='bar',)
plot_feature_importances(feature_importances, n = 10, threshold = 0.95)
most = min(50,df.shape[1]//5)
return feature_importances.feature[0:100].values
my_important = random_forest_selection(df_in)
my_important
解决:
初始数据类型为float64
转换时,正确的书写格式为Int64
这样转换可以转换过去,但是在进入模型的时候会发生问题,
所以,终极的正确的的处理方式是:
缺失值填充后进行数据格式转换;且使用小写int64
df_in = df_in.fillna(0)
df_in['label'] = df_in['label'].astype('int64')
df_in = df_origin
df_in['label'] = df_in['label'].astype("int")
#'Int64'
#df_in['label'] = df_in['label'].astype("Int64")
df_in['label'].value_counts()
# df_in['label'].describe()
完整错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-41-511d3bf43def> in <module>
----> 1 my_important = random_forest_selection(df_in)
2 my_important
<ipython-input-40-4490edd7b355> in random_forest_selection(df)
12 clf_y = df.label
13
---> 14 model = clf.fit(clf_X,clf_y)
15 #feat_importances = pd.DataFrame(model.feature_importances_, index=clf_X.columns, columns=["importance"])
16 feature_importances = pd.DataFrame({'feature': clf_X.columns, 'importance': model.feature_importances_})
D:anacondalibsite-packagessklearnensemble_forest.py in fit(self, X, y, sample_weight)
329 self.n_outputs_ = y.shape[1]
330
--> 331 y, expanded_class_weight = self._validate_y_class_weight(y)
332
333 if getattr(y, "dtype", None) != DOUBLE or not y.flags.contiguous:
D:anacondalibsite-packagessklearnensemble_forest.py in _validate_y_class_weight(self, y)
557
558 def _validate_y_class_weight(self, y):
--> 559 check_classification_targets(y)
560
561 y = np.copy(y)
D:anacondalibsite-packagessklearnutilsmulticlass.py in check_classification_targets(y)
181 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
182 'multilabel-indicator', 'multilabel-sequences']:
--> 183 raise ValueError("Unknown label type: %r" % y_type)
184
185
ValueError: Unknown label type: 'unknown'
最后
以上就是文艺宝贝为你收集整理的sklearn随机森林模型:ValueError: Unknown label type: ‘unknown‘sklearn随机森林模型:ValueError: Unknown label type: 'unknown'的全部内容,希望文章能够帮你解决sklearn随机森林模型:ValueError: Unknown label type: ‘unknown‘sklearn随机森林模型:ValueError: Unknown label type: 'unknown'所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复