数据标准化StandardScaler简介

296 阅读 0 评论 196 点赞

我是靠谱客的博主繁荣滑板，这篇文章主要介绍数据标准化StandardScaler简介，现在分享给大家，希望可以做个参考。

官方文档：http://lijiancheng0614.github.io/scikit-learn/modules/generated/sklearn.preprocessing.StandardScaler.html

Github: https://github.com/scikit-learn/scikit-learn.git

LearnCode/scikit-learn/sklearn/preprocessing/data.py

classStandardScaler

说明：

class StandardScaler(BaseEstimator, TransformerMixin):

// 通过除去平均值和缩放单位化方差来标准化特性

"""Standardize features by removing the mean and scaling to unit variance

// 定中心和缩放发生独立在每个特征中，通过计算训练集中样本的相关性统计。

// 然后平均值和标准差将存储起来，在后续数据转换方法中被使用

Centering and scaling happen independently on each feature by computing

the relevant statistics on the samples in the training set. Mean and

standard deviation are then stored to be used on later data using the

`transform` method.

// 在许多机器学习预测中，标准化是数据集普通的需求：

// 如果个别特征与标准正太分布数据有差异或不相识，他们可能表现的很差

// (比如：高斯分布的0均值和单位方差)

Standardization of a dataset is a common requirement for many

machine learning estimators: they might behave badly if the

individual feature do not more or less look like standard normally

distributed data (e.g. Gaussian with 0 mean and unit variance).

// RBF: Radial Basis Function 径向基函数

// 例如，在学习算法的目标函数中使用的许多元素

// (比如径向基函数中支持向量机的核或线性模型中L1 和 L2的正则化)

// 假设所有特征都集中在0左右，并有相同的方差

// 它将会主导目标函数使得难从其他特征学习到预期的正确预测结果

For instance many elements used in the objective function of

a learning algorithm (such as the RBF kernel of Support Vector

Machines or the L1 and L2 regularizers of linear models) assume that

all features are centered around 0 and have estimator in the same

order. If a feature has a variance that is orders of magnitude larger

that others, it might dominate the objective function and make the

estimator unable to learn from other features correctly as expected.

// 这标准化处理同样可以应用在稀疏矩阵(CSC ：压缩稀疏列、CSR ：压缩稀疏行)

// 通过设置参数`with_mean=False` 避免破坏数据的稀疏结构

This scaler can also be applied to sparse CSR or CSC matrices by passing

`with_mean=False` to avoid breaking the sparsity structure of the data.

Read more in the :ref:`User Guide <preprocessing_scaler>`.

Parameters

----------

with_mean : boolean, True by default

If True, center the data before scaling.

This does not work (and will raise an exception) when attempted on

sparse matrices, because centering them entails building a dense

matrix which in common use cases is likely to be too large to fit in

memory.

with_std : boolean, True by default

If True, scale the data to unit variance (or equivalently,

unit standard deviation).

copy : boolean, optional, default True

If False, try to avoid a copy and do inplace scaling instead.

This is not guaranteed to always work inplace; e.g. if the data is

not a NumPy array or scipy.sparse CSR matrix, a copy may still be

returned.

Attributes

----------

scale_ : ndarray, shape (n_features,)

Per feature relative scaling of the data.

.. versionadded:: 0.17

*scale_* is recommended instead of deprecated *std_*.

mean_ : array of floats with shape [n_features]

The mean value for each feature in the training set.

var_ : array of floats with shape [n_features]

The variance for each feature in the training set. Used to compute

`scale_`

n_samples_seen_ : int

The number of samples processed by the estimator. Will be reset on

new calls to fit, but increments across ``partial_fit`` calls.

最后

以上就是繁荣滑板最近收集整理的关于数据标准化StandardScaler简介的全部内容，更多相关数据标准化StandardScaler简介内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：机器学习
浏览次数：296 次浏览
发布日期：2023-06-05 02:30:04
本文链接：https://www.kaopuke.com/article/k-p-k_14_ujogf4_14_j_2_5.html

数据标准化StandardScaler简介

最后

评论列表共有 0 条评论

发表评论取消回复

数据标准化StandardScaler简介

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复