数据准备:
从泰坦尼克号数据集内拿10行数据用作演示
复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91df = pd.read_csv(r'C:showtrain.csv') print(df) """ age workclass fnlwgt ... hours-per-week native-country income 0 39 State-gov 77516 ... 40 United-States <=50K 1 50 Self-emp-not-inc 83311 ... 13 United-States <=50K 2 38 Private 215646 ... 40 United-States <=50K 3 53 Private 234721 ... 40 United-States <=50K 4 28 Private 338409 ... 40 Cuba <=50K 5 37 Private 284582 ... 40 United-States <=50K 6 49 Private 160187 ... 16 Jamaica <=50K 7 52 Self-emp-not-inc 209642 ... 45 United-States >50K 8 31 Private 45781 ... 50 United-States >50K 9 42 Private 159449 ... 40 United-States >50K """
1.单列查询,获取存在于列表的所有行
search_df = df[df['Search_column].isin(Search_value_list)]
复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37# 筛选age列等于39、28、49的所有行 search_age = [39, 28, 49] age_df = df[df['age'].isin(search_age)] print(age_df) """ age workclass fnlwgt ... hours-per-week native-country income 0 39 State-gov 77516 ... 40 United-States <=50K 4 28 Private 338409 ... 40 Cuba <=50K 6 49 Private 160187 ... 16 Jamaica <=50K """
2.多列查询,获取同时满足n个条件的行
search_df = df[(df['Col_1'] == val_1) & (df['Col_2'] == val_2) & ...]
复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20new_df = df[(df['hours-per-week'] == 40) & (df['age'] == 28) & (df['native-country'] == 'Cuba')] print(new_df) """ age workclass fnlwgt ... hours-per-week native-country income 4 28 Private 338409 ... 40 Cuba <=50K """
3.单列查询,模糊匹配
首先使用方法df['column'].str.extract 配合正则表达式(RE) 拿到相应的索引,然后根据索引拿到想要的DataFrame
复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37# 例如,查找workclass列内包含了'-'的所有行 needed_idx = df['workclass'].str.extract('(.*-.*)').dropna().index new_df = df.loc[needed_idx] print(new_df) """ age workclass fnlwgt ... hours-per-week native-country income 0 39 State-gov 77516 ... 40 United-States <=50K 1 50 Self-emp-not-inc 83311 ... 13 United-States <=50K 7 52 Self-emp-not-inc 209642 ... 45 United-States >50K """
4.逆查询
常配合方法1使用new_df = df[~df['column'].isin(no_need_value_list)]
,当我们知道
a
a
a 与
b
b
b 的并集为全集,且手上只有
a
a
a 的信息(或者
a
a
a更容易获取)时,该方法能方便地拿到符合
b
b
b 条件的所有行
复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76no_need_value = [' Cuba', ' Jamaica'] new_df = df[~df['native-country'].isin(no_need_value)] print(new_df) """ age workclass fnlwgt ... hours-per-week native-country income 0 39 State-gov 77516 ... 40 United-States <=50K 1 50 Self-emp-not-inc 83311 ... 13 United-States <=50K 2 38 Private 215646 ... 40 United-States <=50K 3 53 Private 234721 ... 40 United-States <=50K 5 37 Private 284582 ... 40 United-States <=50K 7 52 Self-emp-not-inc 209642 ... 45 United-States >50K 8 31 Private 45781 ... 50 United-States >50K 9 42 Private 159449 ... 40 United-States >50K """
最后
以上就是狂野小松鼠最近收集整理的关于『Python』Excel文件的读取以及DataFrame的相关操作 (4)—— 常用查询语句的全部内容,更多相关『Python』Excel文件的读取以及DataFrame的相关操作内容请搜索靠谱客的其他文章。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复