人工智能——Pandas数据结构1. 基本函数

312 阅读 0 评论 206 点赞

我是靠谱客的博主清秀服饰，这篇文章主要介绍人工智能——Pandas数据结构1. 基本函数，现在分享给大家，希望可以做个参考。

numpy只能处理数值型数据，很多时候还需处理字符串、时间序列等，就需要Pandas（Python Data Analysis Library，数据分析库）。

Pandas是基于Numpy的一个开源Python库，被广泛应用于数据分析、数据清洗以及准备等工作。数据科学家经常和表格形式的数据（比如.csv ，.tsv ， .xlsx）打交道，Pandas可以使用类似SQL的方式非常方便的加载、处理、分析这些表格形式的数据。同时搭配Matplotlib和Seaborn效果更好。

pandas可以满足以下需求：

具备按轴自动或显示数据对齐功能的数据结构。这可以防止许多由于数据未对齐以及来自不同数据源（索引方式不同）的数据而导致的常见错误、集成时间序列功能、既能处理时间序列数据也能处理非时间序列数据的数据结构、数据运算和简约（比如对某个轴求和）可以根据不同的元数据（轴编号）执行、灵活处理缺失数据。
在实际构建任何模型之前，任何机器学习项目中的大量时间都必须花费在准备数据、分析基本趋势和模式上，因此需要pandas来进行处理。

1. 基本函数

只展示了几个基本函数，更过函数可查询官方文档

1.1 导入

为了方便，我们在使用 pandas 时通常用 pd 作为其缩写，因为 pandas 依赖 numpy，所以在使用之前需要确保安装 numpy 。

import pandas as pd

1.2 Series

pandas Series 类似表格中的一个列，类似一维数组，可以保存任何数据类型。

Series由索引和列组成，函数如下：

pandas.Series(data, index, dtpye, name, copy)

data：一组数据，可以是字典，ndarray，标量
index：数据索引标签，如果不指定，默认从0开始
dtype：数据类型，默认会自己判断
name：设置名称
copy：拷贝数据，默认为False

简单来说，Series其实就是一个带标签的数组。

1.2.1 创建

（1）从列表创建


# 带索引的一列数据
[in]: s1 = pd.Series([1, 2, 3, 4, 5])
[in]: print(s1)
[out]:
0
1
1
2
2
3
3
4
4
5
dtype: int64

（2）从ndarray创建


# 索引为 a b c d e
[in]: s2 = pd.Series(np.arange(5), index=['a', 'b', 'c', 'd', 'e'])
[in]: print(s2)
[out]:
a
0
b
1
c
2
d
3
e
4
dtype: int32

（3）从字典创建


# 创建一个字典,用key来构成表的索引
[in]: temp_dict = {"name": "zhangsan", "age": 27, "tel": 10086}
# 字典数据直接传入data
[in]: s3 = pd.Series(temp_dict)
[in]: print(s3)
[out]:
name
zhangsan
age
27
tel
10086
dtype: object

（4）从标量值构造

data是标量时，指定索引index，得到一个值为data等长的Series

[in]: s4 = pd.Series(1., index=list("abcde"))
[in]: print(s4)
[out]:
a
1.0
b
1.0
c
1.0
d
1.0
e
1.0
dtype: float64

1.2.2 切片与索引

我们可以像使用 ndarray 一样使用 Series ,拿我们之前创建的 s2 举例:

 # 支持数字索引操作
[in]: print(s2[0])
[in]: print("-" * 50)
# 分割线
[out]:
0
--------------------------------------------------
# 支持切片操作
[in]: print(s2[:3])
[in]: print("-" * 50)
# 分割线
[out]:
a
0
b
1
c
2
dtype: int32
--------------------------------------------------
# 花式索引
[in]: print(s2[[1, 0, 2]])
[in]: print("-" * 100)
# 分割线
[out]:
b
1
a
0
c
2
dtype: int32
--------------------------------------------------
# 布尔索引
[in]: print(s2[s2>2])
[out]:
d
3
e
4
dtype: int32

我们可以像使用字典一样使用 Series :

[in]: s2 = pd.Series(np.arange(5), index=['a', 'b', 'c', 'd', 'e'])
[in]: print(s2["a"])
[out]:
0

1.2.3 Series与Python的Dict类型对比

二者都存在映射关系，字典是一种将任意键映射到一组任意值的数据结构，而 Series 对象其实是一种将类型键映射到一组类型值的数据结构，类型至关重要。

就像 Numpy 数组背后特定类型的经过编译的代码使得它在某些操作上比普通的 Python 列表更加高效一样，Pandas Serise 的类型信息适得它在某些操作上比 Python 的字典更高效。另外，和字典不同，Series 对象还支持数组形式的操作，比如切片。

1.2.4 应用Numpy进行运算

向量化操作

简单的向量化操作 Series 与 ndarray 的表现一致,我们依然拿 s2 举例:


# 对应位置相加
[in]: result1 = s2 + s2
[in]: print(result1)
[in]: print("-" * 50)
# 分割线
[out]:
a
0
b
2
c
4
d
6
e
8
dtype: int32
--------------------------------------------------
# 对应位置相乘
[in]: result2 = s2 * s2
[in]: print(result2)
[in]: print("-" * 50)
# 分割线
[out]:
a
0
b
1
c
4
d
9
e
16
dtype: int32
--------------------------------------------------
# 各个位置加2
[in]: result3 = s2 + 2
[in]: print(result3)
[out]:
a
2
b
3
c
4
d
5
e
6
dtype: int32

1.2.5 自动对齐

Series 和 ndarray 不同的地方在于， Series 的操作默认是使用 index 的值进行对齐的，而不是相对位置，我们拿s2[1:] 和 s2[:-1] 举例：

[in]: print(s2[1:])
[out]:
b
1
c
2
d
3
e
4
dtype: int32
[in]: print(s2[:-1])
[out]:
a
0
b
1
c
2
d
3
dtype: int32

[in]: result = s2[1:] + s2[:-1]
[in]: print(result)
[out]:
a
NaN
b
2.0
c
4.0
d
6.0
e
NaN
dtype: float64

对于上面两个不能完全对齐的 Series，结果的 index 是两者的并集，同时不能对齐的部分当作缺失值处理。

1.2.6 缺失值检测

1. Series.isna() 检测缺失值

Series.isna()

检测缺失值，缺失值为 True ，非缺失值为 False.

2. Series.fillna() 填充缺失值

Series.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)

values： dict, Series, or DataFrame (填充nan的值，不能是list)。
method： {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None (填充方法，用前/后面的值进行填充)。
axis： {0 or ‘index’} (前后或上下填充)
limit： nt, default None (连续填充的值的数量)

3. Series.dropna() 删除缺失值

Series.dropna(axis=0, inplace=False, how=None)

axis： 默认为 0，表示逢空值剔除整行，如果设置参数 axis＝1 表示逢空值去掉整列。
inplace： 如果设置 True，将计算得到的值直接覆盖之前的值并返回 None，修改的是源数据。
how： 默认为 any 如果一行（或一列）里任何一个数据有出现 NA 就去掉整行，如果设置 how='all' 一行（或列）都是 NA 才去掉这整行。


# 创建一个带缺失值Series
[in]: s = pd.Series([1, 2, None, 4, None])
# 检测缺失值
[in]: print(s.isna())
[in]: print("-" * 50)
[out]:
0
False
1
False
2
True
3
False
4
True
dtype: bool
--------------------------------------------------
# 填充缺失值，填充值为0.
[in]: print(s.fillna(0.))
[out]:
0
1.0
1
2.0
2
0.0
3
4.0
4
0.0
dtype: float64

1.2.7 name属性

我们可以在定义时指定 name 属性:

[in]: s = pd.Series(np.arange(5), name="something")
[in]: print(s.name)
[out]:
something

1.2.8 idex属性和values属性

通过 Series.index 查看索引, 通过 values 查看值

[in]: s = pd.Series(np.arange(5), index=list("abcde"))
[in]: print(s.index)
[out]:
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
[in]: print(s.values)
[out]:
[0 1 2 3 4]

1.3 DataFrame

DataFrame 是 pandas 中的二维数据结构，可以看成一个 Excel 中的工作表，或者一个 SQL 表,或者一个存储Series 对象的字典，它含有一组有序的列，每列可以是不同的值类型（数值、字符串、布尔型值）。

DataFrame 既有行索引也有列索引，它可以被看做由 Series 组成的字典（共同用一个索引），其中的 index用于指定行的 label,columns 用于指定列的 label ,如果参数不传入,那么会按照传入的内容进行设定。

1.3.1 二维ndrray

DataFrame 是一个二维的 ndarray ,构造方法如下:

pandas.DataFrame( data, index, columns, dtype, copy)

data：一组数据(ndarray、series, map, lists, dict 等类型)。
index：索引值，或者可以称为行标签。
columns：列标签，默认为 RangeIndex (0, 1, 2, …, n) 。
dtype：数据类型。
copy：拷贝数据，默认为 False。

1.3.2 从列表创建

可以使用单个列表或者嵌套列表创建 DataFrame

1.使用单个列表创建

[in]: data = [1, 2, 3, 4, 5]
[in]: df = pd.DataFrame(data)
[in]: print(df)
[out]:
0
0
1
1
2
2
3
3
4
4
5

2.使用嵌套列表创建并指定列索引

[in]: data = [['xiaoming', 10], ['xiaohong', 11], ['xiaozhang', 12]]
[in]: df = pd.DataFrame(data, columns=['Name', 'Age'])
[in]: print(df)
[out]:
Name
Age
0
xiaoming
10
1
xiaohong
11
2
xiaozhang
12

1.3.3 从字典创建

1.使用数组创建

注意:两个数据的长度必须一样

[in]: data = {'Name': ['xiaoming', 'zhangsan', 'lisi'], 'Age':[10, 11, 21]}
[in]: df = pd.DataFrame(data)
[in]: print(df)
[out]:
Name
Age
0
xiaoming
10
1
zhangsan
11
2
lisi
21

2.从字典列表创建

字典列表可作为输入数据传递以用来创建 DataFrame ,字典键默认为列名,缺失的部分会自动填充 NaN 值

[in]: data = [{'a': 1, 'b': 2}, {'a': 5, 'b': 6, 'c': 7}]
[in]: df = pd.DataFrame(data, index=["first", "second"])
# 指定行索引
[in]: print(df)
[out]:
a
b
c
first
1
2
NaN
second
5
6
7.0

最后

以上就是清秀服饰最近收集整理的关于人工智能——Pandas数据结构1. 基本函数的全部内容，更多相关人工智能——Pandas数据结构1.内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：深度学习
浏览次数：312 次浏览
发布日期：2023-10-18 23:51:33

人工智能——Pandas数据结构1. 基本函数

1. 基本函数

1.1 导入

1.2 Series

1.2.1 创建

1.2.2 切片与索引

1.2.3 Series与Python的Dict类型对比

1.2.4 应用Numpy进行运算

1.2.5 自动对齐

1.2.6 缺失值检测

1.2.7 name属性

1.2.8 idex属性和values属性

1.3 DataFrame

1.3.1 二维ndrray

1.3.2 从列表创建

1.3.3 从字典创建

最后

评论列表共有 0 条评论

发表评论取消回复

人工智能——Pandas数据结构1. 基本函数

1. 基本函数

1.1 导入

1.2 Series

1.2.1 创建

1.2.2 切片与索引

1.2.3 Series与Python的Dict类型对比

1.2.4 应用Numpy进行运算

1.2.5 自动对齐

1.2.6 缺失值检测

1.2.7 name属性

1.2.8 idex属性和values属性

1.3 DataFrame

1.3.1 二维ndrray

1.3.2 从列表创建

1.3.3 从字典创建

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复