概述
通过pandas获取筛选数据的方法:可以选择索引切片,也可以通过字段名筛数据:
筛选数据:
1.索引的形式:
(1)一个索引:food_info.loc[0]
(2)多个索引:food_info.loc[3:6];food_info.loc[[2,5,10]
2.通过字段名:food_info["NDB_No"];food_info[["Zinc_(mg)", "Copper_(mg)"]]#真实奇怪,获取多个列时,需要2个【】【】
3.将矩阵转化为python列表list:food_info.columns.tolist()
4.运算:获取最大值,food_info["Energ_Kcal"].max()
5.排序:sort_values(by,axis=0,ascending=True,inplace=False,kind="quicksort",na_position="last")
参数 | 说明 |
by | 指定列名(axis=0或’index’)或索引值(axis=1或’columns’) |
axis | 若axis=0或’index’,则按照指定列中数据大小排序;若axis=1或’columns’,则按照指定索引中数据大小排序,默认axis=0 |
ascending | 是否按指定列的数组升序排列,默认为True,即升序排列 |
inplace | 是否用排序后的数据集替换原来的数据,默认为False,即不替换 |
kind | |
na_position | {‘first’,‘last’},设定缺失值的显示位置 |
通过pandas获取筛选数据的方法:可以选择索引切片,也可以通过字段名筛数据:
方法一:通过索引取值,操作类似切片,按行索引取值,类似SQL
1)通过1个索引获取数据:
# -*- coding: utf-8 -*-
import pandas
#读取文件
food_info = pandas.read_csv("/Users/liyili2/Downloads/tang/pandas_data/food_info.csv")
#pandas.loc[i]:按照行索引号取数,如果索引号不存在,会报错提示。
result1=food_info.loc[0]
result2=food_info.loc[6]
result3=food_info.loc[8000]
2)通过多个索引获取数据:
# -*- coding: utf-8 -*-
import pandas
#读取文件
food_info = pandas.read_csv("/Users/liyili2/Downloads/tang/pandas_data/food_info.csv")
result4=food_info.loc[3:6]
print(result4)#返回索引号3-6的数据
two_five_ten = [2,5,10]#返回索引2,5,10的值,#方法一
print(food_info.loc[two_five_ten])
print(food_info.loc[[2,5,10]])#返回索引2,5,10的值,#方法二
总的:
# -*- coding: utf-8 -*-
import pandas
#读取文件
food_info = pandas.read_csv("/Users/liyili2/Downloads/tang/pandas_data/food_info.csv")
#pandas.loc[i]:按照行索引号取数,如果索引号不存在,会报错提示。
result1=food_info.loc[0]
print("result1的结果",result1.dtype)
print(result1)
print("
")
result2=food_info.loc[6]
print("result2的结果")
print(result2)
print("
")
result3=food_info.loc[8000]
print("result3的结果")
print(result3)
result4=food_info.loc[3:6]
print("result4的结果:返回索引号3-6的数据")
print(result4)
#返回索引2,5,10的值
#方法一
two_five_ten = [2,5,10]
print("方法一,result5的结果:返回索引号2,5,10的数据")
print(food_info.loc[two_five_ten])
#方法二
print("方法二,result5的结果:返回索引号2,5,10的数据")
print(food_info.loc[[2,5,10]])
方法二:通过字段名获取筛选数据
单独获取一列数据:
# -*- coding: utf-8 -*-
import pandas
#读取文件
food_info = pandas.read_csv("/Users/liyili2/Downloads/tang/pandas_data/food_info.csv")
ndb_col = food_info["NDB_No"]
print("获取列名为:NDB_No的数据")
print(ndb_col)
col_name = "NDB_No"
ndb_col = food_info[col_name]
获取多列数据:
# -*- coding: utf-8 -*-
import pandas
#读取文件
food_info = pandas.read_csv("/Users/liyili2/Downloads/tang/pandas_data/food_info.csv")
columns = ["Zinc_(mg)", "Copper_(mg)"]
zinc_copper = food_info[columns]
print(zinc_copper)
zinc_copper = food_info[["Zinc_(mg)", "Copper_(mg)"]]
print(zinc_copper)
# -*- coding: utf-8 -*-
import pandas
#读取文件
food_info = pandas.read_csv("/Users/liyili2/Downloads/tang/pandas_data/food_info.csv")
print(food_info.columns)
print(type(food_info.columns))
print(food_info.head(2))
col_names = food_info.columns.tolist()#可以使用tolist()函数转化为list
print(type(col_names))
print(col_names)
gram_columns = []
for c in col_names:
if c.endswith("(g)"):#记住了,新知识点
gram_columns.append(c)
gram_df = food_info[gram_columns]
print("
")
print(gram_df.head(3))
结果:
Index(['NDB_No', 'Shrt_Desc', 'Water_(g)', 'Energ_Kcal', 'Protein_(g)',
'Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)',
'Sugar_Tot_(g)', 'Calcium_(mg)', 'Iron_(mg)', 'Magnesium_(mg)',
'Phosphorus_(mg)', 'Potassium_(mg)', 'Sodium_(mg)', 'Zinc_(mg)',
'Copper_(mg)', 'Manganese_(mg)', 'Selenium_(mcg)', 'Vit_C_(mg)',
'Thiamin_(mg)', 'Riboflavin_(mg)', 'Niacin_(mg)', 'Vit_B6_(mg)',
'Vit_B12_(mcg)', 'Vit_A_IU', 'Vit_A_RAE', 'Vit_E_(mg)', 'Vit_D_mcg',
'Vit_D_IU', 'Vit_K_(mcg)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)',
'Cholestrl_(mg)'],
dtype='object')
<class 'pandas.core.indexes.base.Index'>
NDB_No
Shrt_Desc
...
FA_Poly_(g)
Cholestrl_(mg)
0
1001
BUTTER WITH SALT
...
3.043
215.0
1
1002
BUTTER WHIPPED WITH SALT
...
3.012
219.0
[2 rows x 36 columns]
<class 'list'>
['NDB_No', 'Shrt_Desc', 'Water_(g)', 'Energ_Kcal', 'Protein_(g)', 'Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)', 'Sugar_Tot_(g)', 'Calcium_(mg)', 'Iron_(mg)', 'Magnesium_(mg)', 'Phosphorus_(mg)', 'Potassium_(mg)', 'Sodium_(mg)', 'Zinc_(mg)', 'Copper_(mg)', 'Manganese_(mg)', 'Selenium_(mcg)', 'Vit_C_(mg)', 'Thiamin_(mg)', 'Riboflavin_(mg)', 'Niacin_(mg)', 'Vit_B6_(mg)', 'Vit_B12_(mcg)', 'Vit_A_IU', 'Vit_A_RAE', 'Vit_E_(mg)', 'Vit_D_mcg', 'Vit_D_IU', 'Vit_K_(mcg)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)', 'Cholestrl_(mg)']
Water_(g)
Protein_(g)
Lipid_Tot_(g)
...
FA_Sat_(g)
FA_Mono_(g)
FA_Poly_(g)
0
15.87
0.85
81.11
...
51.368
21.021
3.043
1
15.87
0.85
81.11
...
50.489
23.426
3.012
2
0.24
0.28
99.48
...
61.924
28.732
3.694
[3 rows x 10 columns]
3.pandas中数据类型
#object - For string values
#int - For integer values
#float - For float values
#datetime - For time values
#bool - For Boolean values
#print(food_info.dtypes)
4.pandas运算
# -*- coding: utf-8 -*-
import pandas
#读取文件
food_info = pandas.read_csv("/Users/liyili2/Downloads/tang/pandas_data/food_info.csv")
print(food_info.head(3))
col_names = food_info.columns.tolist()#可以使用tolist()函数转化为list
print(food_info["Iron_(mg)"])
div_1000 = food_info["Iron_(mg)"] / 1000#"字段Iron_(mg)的所有数据都要除以1000"
print(div_1000)
add_100 = food_info["Iron_(mg)"] + 100#"字段Iron_(mg)的所有数据都要加上100"
print(add_100)
sub_100 = food_info["Iron_(mg)"] - 100#"字段Iron_(mg)的所有数据都要减去100"
print(sub_100)
mult_2 = food_info["Iron_(mg)"]*2#"字段Iron_(mg)的所有数据都要乘以2"
print(mult_2)
water_energy = food_info["Water_(g)"] * food_info["Energ_Kcal"]
iron_grams = food_info["Iron_(mg)"] / 1000
result=food_info["Iron_(g)"] = iron_grams
print(result)
weighted_protein = food_info["Protein_(g)"] * 2
weighted_fat = -0.75 * food_info["Lipid_Tot_(g)"]
initial_rating = weighted_protein + weighted_fat
print(initial_rating)
max_calories = food_info["Energ_Kcal"].max()#获取所在行的最大值
normalized_calories = food_info["Energ_Kcal"] / max_calories
normalized_protein = food_info["Protein_(g)"] / food_info["Protein_(g)"].max()
normalized_fat = food_info["Lipid_Tot_(g)"] / food_info["Lipid_Tot_(g)"].max()
food_info["Normalized_Protein"] = normalized_protein
food_info["Normalized_Fat"] = normalized_fat
5.排序:
sort_values(by,axis=0,ascending=True,inplace=False,kind="quicksort",na_position="last")
参数 | 说明 |
by | 指定列名(axis=0或’index’)或索引值(axis=1或’columns’) |
axis | 若axis=0或’index’,则按照指定列中数据大小排序;若axis=1或’columns’,则按照指定索引中数据大小排序,默认axis=0 |
ascending | 是否按指定列的数组升序排列,默认为True,即升序排列 |
inplace | 是否用排序后的数据集替换原来的数据,默认为False,即不替换 |
kind | |
na_position | {‘first’,‘last’},设定缺失值的显示位置 |
参考:https://blog.csdn.net/MsSpark/article/details/83154128
例子:
利用字典dict创建数据框
import numpy as np
import pandas as pd
print("原始数据")
df=pd.DataFrame({'col1':['A','A','B',np.nan,'D','C'],
'col2':[2,1,9,8,7,7],
'col3':[0,1,9,4,2,8]
})
print(df)
print("
")
print("依据第一列排序,并将该列空值放在首位")
print(df.sort_values(by=['col1'],na_position='first'))
print("
")
print("依据第二、三列,数值降序排序")
print(df.sort_values(by=['col2','col3'],ascending=False))
print("
")
print("根据第一列中数值排序,按降序排列,并替换原数据")
df.sort_values(by=['col1'],ascending=False,inplace=True,
na_position='first')
print(df)
x = pd.DataFrame({'x1':[1,2,2,3],'x2':[4,3,2,1],'x3':[3,2,4,1]})
print("
")
print(x)
print("
")
print("按照索引值为0的行,即第一行的值来降序排序")
print(x.sort_values(by =0,ascending=False,axis=1))
结果展示:
原始数据 | 依据第一列排序,并将该列空值放在首位 |
col1 col2 col3 0 A 2 0 1 A 1 1 2 B 9 9 3 NaN 8 4 4 D 7 2 5 C 7 8 | col1 col2 col3 3 NaN 8 4 0 A 2 0 1 A 1 1 2 B 9 9 5 C 7 8 4 D 7 2 |
原始数据 | 依据第二、三列,数值降序排序 |
col1 col2 col3 0 A 2 0 1 A 1 1 2 B 9 9 3 NaN 8 4 4 D 7 2 5 C 7 8 | col1 col2 col3 2 B 9 9 3 NaN 8 4 5 C 7 8 4 D 7 2 0 A 2 0 1 A 1 1 |
原始数据 | 根据第一列中数值排序,按降序排列,并替换原数据 |
col1 col2 col3 0 A 2 0 1 A 1 1 2 B 9 9 3 NaN 8 4 4 D 7 2 5 C 7 8 | col1 col2 col3 3 NaN 8 4 4 D 7 2 5 C 7 8 2 B 9 9 0 A 2 0 1 A 1 1 |
原始数据 | 按照索引值为0的行,即第一行的值来降序排序 |
x1 x2 x3 0 1 4 3 1 2 3 2 2 2 2 4 3 3 1 1 | x2 x3 x1 0 4 3 1 1 3 2 2 2 2 4 2 3 1 1 3 |
最后
以上就是完美纸飞机为你收集整理的Python数据分析与机器学习实战-07.pandas索引与计算的全部内容,希望文章能够帮你解决Python数据分析与机器学习实战-07.pandas索引与计算所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复