我是靠谱客的博主 高高夕阳,最近开发中收集的这篇文章主要介绍西瓜书《机器学习》课后答案——chapter11_11.1 Relief特征选择算法,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

试编程实现Relief算法,并考察其在西瓜数据集3.0上的运行结果。

# -*- coding: gbk -*-
"""
Author: Victoria
Date: 201.11.14 16:00
"""
import numpy as np
import operator
import xlrd

def diff(x, y, discrete=True):
    """
    Input:
        x: value
        y: value
    """
    if discrete:
        if x==y:
            return 0
        else:
            return 1
    else:
        return abs(x-y)

def nearest(X, y, x, x_label):
    """
    Compute near-hit and near-miss in binary classification problem.
    Input:
        X: list, instances
        y: list, labels
        x: instance
        x_label: label for instance x
    Return:
        near_hit: list, instance
        near_miss: list, instance
    """
    near_hit_dist = float("inf")
    near_miss_dist = float("inf")
    for i in range(len(X)):
        dist = np.sum((np.array(x) - np.array(X[i]))**2)
        if y[i]==x_label:          
            if dist < near_hit_dist and dist!=0:
                near_hit = X[i]
                near_hit_dist = dist
        else:
            if dist < near_miss_dist:
                near_miss = X[i]
                near_miss_dist = dist
    return near_hit, near_miss

def relief(X, y, k, discrete):
    """
    Input:
        X: list
        y: list
        k: selecting features number
        discrete: list, whether the corresponding feature is discrete or not
    """
    N = len(X)
    d = len(X[0])
    delta = []
    for i in range(d):
        delta_i = 0
        for j in range(N):
            near_hit, near_miss = nearest(X, y, X[j], y[j])            
            delta_i += -diff(X[j][i], near_hit[i], discrete[i])**2 + diff(X[j][i], near_miss[i], discrete[i])**2
        delta.append(delta_i)
    print delta
    features_and_delta = zip(range(d), delta)
    features_and_delta = sorted(features_and_delta, key=operator.itemgetter(1), reverse=True)
    print features_and_delta
    select_features_index = [features_and_delta[i][0] for i in range(k)]
    return select_features_index

if __name__=="__main__":
    workbook = xlrd.open_workbook("../../数据/3.0.xlsx")
    sheet = workbook.sheet_by_name("Sheet1")
    X = []
    y = []
    for i in range(17):
        X.append(sheet.col_values(i)[:-1])
    y = sheet.row_values(8)

    relief(X, y, 1, [1, 1, 1, 1, 1, 1, 0, 0])

特征[色泽,根蒂,敲声,纹理,脐部,触感,密度,含糖量]对应的统计量为:

[-4, 6, -1, 8, 2, -3, -0.40418499999999996, 0.26350200000000007]

可以看出Relief算法认为最好的特征是纹理,其次是根蒂。

最后

以上就是高高夕阳为你收集整理的西瓜书《机器学习》课后答案——chapter11_11.1 Relief特征选择算法的全部内容,希望文章能够帮你解决西瓜书《机器学习》课后答案——chapter11_11.1 Relief特征选择算法所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(36)

评论列表共有 0 条评论

立即
投稿
返回
顶部