我是靠谱客的博主 狂野小松鼠,最近开发中收集的这篇文章主要介绍『Python』Excel文件的读取以及DataFrame的相关操作 (4)—— 常用查询语句,觉得挺不错的,现在分享给大家,希望可以做个参考。
概述
数据准备:
从泰坦尼克号数据集内拿10行数据用作演示
df = pd.read_csv(r'C:showtrain.csv')
print(df)
"""
age
workclass
fnlwgt
... hours-per-week
native-country
income
0
39
State-gov
77516
...
40
United-States
<=50K
1
50
Self-emp-not-inc
83311
...
13
United-States
<=50K
2
38
Private
215646
...
40
United-States
<=50K
3
53
Private
234721
...
40
United-States
<=50K
4
28
Private
338409
...
40
Cuba
<=50K
5
37
Private
284582
...
40
United-States
<=50K
6
49
Private
160187
...
16
Jamaica
<=50K
7
52
Self-emp-not-inc
209642
...
45
United-States
>50K
8
31
Private
45781
...
50
United-States
>50K
9
42
Private
159449
...
40
United-States
>50K
"""
1.单列查询,获取存在于列表的所有行
search_df = df[df['Search_column].isin(Search_value_list)]
# 筛选age列等于39、28、49的所有行
search_age = [39, 28, 49]
age_df = df[df['age'].isin(search_age)]
print(age_df)
"""
age
workclass
fnlwgt
... hours-per-week
native-country
income
0
39
State-gov
77516
...
40
United-States
<=50K
4
28
Private
338409
...
40
Cuba
<=50K
6
49
Private
160187
...
16
Jamaica
<=50K
"""
2.多列查询,获取同时满足n个条件的行
search_df = df[(df['Col_1'] == val_1) & (df['Col_2'] == val_2) & ...]
new_df = df[(df['hours-per-week'] == 40) &
(df['age'] == 28) &
(df['native-country'] == 'Cuba')]
print(new_df)
"""
age workclass
fnlwgt
... hours-per-week
native-country
income
4
28
Private
338409
...
40
Cuba
<=50K
"""
3.单列查询,模糊匹配
首先使用方法df['column'].str.extract 配合正则表达式(RE) 拿到相应的索引,然后根据索引拿到想要的DataFrame
# 例如,查找workclass列内包含了'-'的所有行
needed_idx = df['workclass'].str.extract('(.*-.*)').dropna().index
new_df = df.loc[needed_idx]
print(new_df)
"""
age
workclass
fnlwgt
... hours-per-week
native-country
income
0
39
State-gov
77516
...
40
United-States
<=50K
1
50
Self-emp-not-inc
83311
...
13
United-States
<=50K
7
52
Self-emp-not-inc
209642
...
45
United-States
>50K
"""
4.逆查询
常配合方法1使用new_df = df[~df['column'].isin(no_need_value_list)]
,当我们知道
a
a
a 与
b
b
b 的并集为全集,且手上只有
a
a
a 的信息(或者
a
a
a更容易获取)时,该方法能方便地拿到符合
b
b
b 条件的所有行
no_need_value = [' Cuba', ' Jamaica']
new_df = df[~df['native-country'].isin(no_need_value)]
print(new_df)
"""
age
workclass
fnlwgt
... hours-per-week
native-country
income
0
39
State-gov
77516
...
40
United-States
<=50K
1
50
Self-emp-not-inc
83311
...
13
United-States
<=50K
2
38
Private
215646
...
40
United-States
<=50K
3
53
Private
234721
...
40
United-States
<=50K
5
37
Private
284582
...
40
United-States
<=50K
7
52
Self-emp-not-inc
209642
...
45
United-States
>50K
8
31
Private
45781
...
50
United-States
>50K
9
42
Private
159449
...
40
United-States
>50K
"""
最后
以上就是狂野小松鼠为你收集整理的『Python』Excel文件的读取以及DataFrame的相关操作 (4)—— 常用查询语句的全部内容,希望文章能够帮你解决『Python』Excel文件的读取以及DataFrame的相关操作 (4)—— 常用查询语句所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复