Pandas组队学习(上)——综合练习题答案

206 阅读 0 评论 136 点赞

我是靠谱客的博主敏感麦片，这篇文章主要介绍Pandas组队学习(上)——综合练习题答案，现在分享给大家，希望可以做个参考。

第一阶段学习总结：Datawhale组织的Pandas组队学习暂告一段落。快乐pandas以自主学习、助教答疑、定期打卡等形式，课程资料编写比较详细，配合思考题和练习巩固，无奈于本人较菜，对所学的知识掌握还不够透彻，完成这次综合练习断断续续花了两天时间，后面一定要抽时间复盘之前所学的内容。
全面学习基础内容有助于建立体系，了解各大功能模块，在使用时才能有所了解，找准方向。因此后面会继续参加pandas(下)的学习，争取学完这期教程。

数据集下载：pandas教程（上）综合题数据集
往期内容：
Pandas学习总结——1. 基础操作(文件读写)、数据结构(Series、DataFrame)、常用基本函数、数据排序
Pandas学习总结——2. 一文详述 Pandas索引
Pandas学习总结——3. Pandas分组都有哪些操作？
Pandas学习总结——4. Pandas变形之透视表 & 变形方法
Pandas学习总结——5. 合并 (append与assign、combine与update、concat、merge与join)

题目快速跳转：

- 一、2002 年-2018 年上海机动车拍照拍卖
- - - （1）
    - （2）
    - （3）
    - （4）
    - （5）
    - （6）
- 二、2007 年-2019 年俄罗斯机场货运航班运载量
- - - （1）
    - （2）
    - （3）
    - （4）
    - （5）
    - （6）
- 三、新冠肺炎在美国的传播
- - - （1）
    - （2）
    - （3）
    - （4）
    - （5）
    - （6）

import numpy as np
import pandas as pd
import re

一、2002 年-2018 年上海机动车拍照拍卖

vehicle = pd.read_csv('数据集/2002年-2018年上海机动车拍照拍卖.csv')
vehicle.head()

	Date	Total number of license issued	lowest price	avg price	Total number of applicants
0	2-Jan	1400	13600	14735	3718
1	2-Feb	1800	13100	14057	4590
2	2-Mar	2000	14300	14662	5190
3	2-Apr	2300	16000	16334	4806
4	2-May	2350	17800	18357	4665

（1）

问：哪一次拍卖的中标率首次小于 5%？

vehicle_1 = vehicle.assign(probability=vehicle['Total number of license issued'] / vehicle['Total number of applicants'])
vehicle_1[vehicle_1['probability']<0.05].head(1)

	Date	Total number of license issued	lowest price	avg price	Total number of applicants	probability
159	15-May	7482	79000	79099	156007	0.047959

（2）

问：按年统计拍卖最低价的下列统计量：最大值、均值、0.75 分位数，要求显示在同一张表上。

vehicle_2 = vehicle.assign(year=vehicle['Date'].apply(lambda x: x.split('-')[0]))
vehicle_2.head()
groupby_ = vehicle_2.groupby('year')
new_dataframe = pd.DataFrame()
new_dataframe['max'] = groupby_['lowest price'].max()
new_dataframe['mean'] = groupby_['lowest price'].mean()
new_dataframe['quantile'] = groupby_['lowest price'].quantile(0.75)
new_dataframe

	max	mean	quantile
year
10	44900	38008.333333	41825.0
11	53800	47958.333333	51000.0
12	68900	61108.333333	65325.0
13	90800	79125.000000	82550.0
14	74600	73816.666667	74000.0
15	85300	80575.000000	83450.0
16	88600	85733.333333	87475.0
17	93500	90616.666667	92350.0
18	89000	87825.000000	88150.0
2	30800	20316.666667	24300.0
3	38500	31983.333333	36300.0
4	44200	29408.333333	38400.0
5	37900	31908.333333	35600.0
6	39900	37058.333333	39525.0
7	53800	45691.666667	48950.0
8	37300	29945.454545	34150.0
9	36900	31333.333333	34150.0

（3）

问：将第一列时间列拆分成两个列，一列为年份（格式为 20××），另一列为月份（英语缩写），添加到列表作为第一第二列，并将原表第一列删除，其他列依次向后顺延。

Year = vehicle['Date'].apply(lambda x: str(20) + ("%02d" % int(x.split('-')[0])))
Month = vehicle['Date'].apply(lambda x: x.split('-')[1])
vehicle_3 = vehicle.drop(columns='Date')
vehicle_3.insert(0, 'Year', Year)
vehicle_3.insert(1, 'Month', Month)
vehicle_3.head()

	Year	Month	Total number of license issued	lowest price	avg price	Total number of applicants
0	2002	Jan	1400	13600	14735	3718
1	2002	Feb	1800	13100	14057	4590
2	2002	Mar	2000	14300	14662	5190
3	2002	Apr	2300	16000	16334	4806
4	2002	May	2350	17800	18357	4665

（4）

问：现在将表格行索引设为多级索引，外层为年份，内层为原表格第二至第五列的变量名，列索引为月份。

vehicle_4 = vehicle_3.set_index(['Year', 'Total number of license issued', 'lowest price', 'avg price', 'Total number of applicants'])
vehicle_4.head()

					Month
Year	Total number of license issued	lowest price	avg price	Total number of applicants
2002	1400	13600	14735	3718	Jan
	1800	13100	14057	4590	Feb
	2000	14300	14662	5190	Mar
	2300	16000	16334	4806	Apr
	2350	17800	18357	4665	May

（5）

问：一般而言某个月最低价与上月最低价的差额，会与该月均值与上月均值的差额具有相同的正负号，哪些拍卖时间不具有这个特点？

vehicle_5 = vehicle[['Date','lowest price','avg price']]
for index in range(1, len(vehicle_5)):
if ((vehicle_5.iloc[index]['lowest price'] - vehicle_5.iloc[index-1]['lowest price'])
* (vehicle_5.iloc[index]['avg price'] - vehicle_5.iloc[index-1]['avg price'])) < 0:
print(vehicle_5.iloc[index]['Date'])

3-Oct
3-Nov
4-Jun
5-Jan
5-Feb
5-Sep
6-May
6-Sep
7-Jan
7-Feb
7-Dec
12-Oct

（6）

问：将某一个月牌照发行量与其前两个月发行量均值的差额定义为发行增益，最初的两个月用 0 填充，求发行增益极值出现的时间。

vehicle_6 = vehicle[['Date','Total number of license issued']]
gain = [0,0]
string = 'Total number of license issued'
for index in range(2, len(vehicle_6)):
gain.append(vehicle.iloc[index][string] - np.mean([vehicle.iloc[index-1][string], vehicle.iloc[index-2][string]]))
vehicle_6['gain'] = gain
print('增益极大值出现时间：', vehicle_6[vehicle_6['gain'] == vehicle_6['gain'].max()]['Date'].values)
print('增益极小值出现时间：', vehicle_6[vehicle_6['gain'] == vehicle_6['gain'].min()]['Date'].values)

增益极大值出现时间： ['8-Jan']
增益极小值出现时间： ['8-Apr']

二、2007 年-2019 年俄罗斯机场货运航班运载量

Q2 = pd.read_csv('数据集/2007年-2019年俄罗斯货运航班运载量.csv')
Q2.head()

	Airport name	Year	January	February	March	April	May	June	July	August	September	October	November	December	Whole year	Airport coordinates
0	Abakan	2019	44.70	66.21	72.7	75.82	100.34	78.38	63.88	73.06	66.74	75.44	110.5	89.8	917.57	(Decimal('91.399735'), Decimal('53.751351'))
1	Aikhal	2019	0.00	0.00	0.0	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.0	0.0	0.00	(Decimal('111.543324'), Decimal('65.957161'))
2	Loss	2019	0.00	0.00	0.0	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.0	0.0	0.00	(Decimal('125.398355'), Decimal('58.602489'))
3	Amderma	2019	0.00	0.00	0.0	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.0	0.0	0.00	(Decimal('61.577429'), Decimal('69.759076'))
4	Anadyr (Carbon)	2019	81.63	143.01	260.9	304.36	122.00	106.87	84.99	130.00	102.00	118.00	94.0	199.0	1746.76	(Decimal('177.738273'), Decimal('64.713433'))

（1）

问：求每年货运航班总运量

groupby2 = Q2.groupby('Year')
print('各年货运总量航班：')
groupby2['Whole year'].sum()

各年货运总量航班：
Year
2007
659438.23
2008
664682.46
2009
560809.77
2010
693033.98
2011
818691.71
2012
846388.03
2013
792337.08
2014
729457.12
2015
630208.97
2016
679370.15
2017
773662.28
2018
767095.28
2019
764606.27
Name: Whole year, dtype: float64

（2）

问：每年记录的机场都是相同的吗？

print('各年统计的机场数量：')
display(groupby2['Airport name'].count())
Q2_2 = Q2.groupby('Airport name')['Year']
print('各机场被统计的总数：')
display(Q2_2.count())
print( '有 %d个机场并不是每年都被统计。' %
len(Q2_2.count()[(Q2_2.count())<(2019-2007+1)]))

各年统计的机场数量：
Year
2007
292
2008
292
2009
292
2010
292
2011
292
2012
292
2013
292
2014
292
2015
292
2016
292
2017
292
2018
248
2019
251
Name: Airport name, dtype: int64
各机场被统计的总数：
Airport name
Abakan
13
Achinsk
11
Aikhal
13
Amderma
13
Anadyr
12
..
Лешуконское
13
Мотыгино
13
Нюрба
13
Среднеколымск
13
Таксимо
13
Name: Year, Length: 297, dtype: int64
有 63个机场并不是每年都被统计。

（3）

问：按年计算 2010 年-2015 年全年货运量记录为 0 的机场航班比例。

Q2_3 = Q2.set_index(['Year'])
groupby_3 = Q2_3.sort_index().loc[2010:2015].groupby('Year')
display(groupby_3['Whole year'].agg( lambda x: print( '年份：', x.index[0], 't 比例：%.2f' % (len(x[x==0]) / len(x) * 100), '%' ) ))

年份： 2010
比例：76.71 %
年份： 2011
比例：77.05 %
年份： 2012
比例：77.05 %
年份： 2013
比例：77.05 %
年份： 2014
比例：77.05 %
年份： 2015
比例：77.05 %
Year
2010
None
2011
None
2012
None
2013
None
2014
None
2015
None
Name: Whole year, dtype: object

（4）

问：若某机场至少存在 5 年或以上满足所有月运量记录都为 0，则将其所有年份的记录信息从表中删除，并返回处理后的表格

Q2_4 = Q2.set_index('Airport name')
groupby_4 = Q2_4.groupby('Airport name')
zero_airport = pd.DataFrame(groupby_4['Whole year'].apply( lambda x: len(x[x==0])>5 ))
Q2_4.drop(zero_airport[zero_airport['Whole year']==True].index).head()

	Year	January	February	March	April	May	June	July	August	September	October	November	December	Whole year	Airport coordinates
Airport name
Abakan	2019	44.70	66.21	72.70	75.82	100.34	78.38	63.88	73.06	66.74	75.44	110.50	89.80	917.57	(Decimal('91.399735'), Decimal('53.751351'))
Anadyr (Carbon)	2019	81.63	143.01	260.90	304.36	122.00	106.87	84.99	130.00	102.00	118.00	94.00	199.00	1746.76	(Decimal('177.738273'), Decimal('64.713433'))
Anapa (Vitjazevo)	2019	45.92	53.15	54.00	54.72	52.00	67.45	172.31	72.57	70.00	63.00	69.00	82.10	856.22	(Decimal('37.341511'), Decimal('45.003748'))
Arkhangelsk (Talagy)	2019	85.61	118.70	131.39	144.82	137.95	140.18	128.56	135.68	124.75	139.60	210.27	307.10	1804.61	(Decimal('40.714892'), Decimal('64.596138'))
Astrakhan (Narimanovo)	2019	51.75	61.08	65.60	71.84	71.38	63.95	164.86	79.46	85.21	87.23	79.06	99.16	980.58	(Decimal('47.999896'), Decimal('46.287344'))

（5）

问：采用一种合理的方式将所有机场划分为东南西北四个分区，并给出 2017 年 - 2019 年货运总量最大的区域。

Q2 = pd.read_csv('数据集/2007年-2019年俄罗斯货运航班运载量.csv')
Q2_5 = Q2.set_index(['Year']).sort_index().loc[2017:2019].loc[:, ['Whole year','Airport coordinates']]
# 坐标 提取
longitude = []
latitude = []
for i in range(0, len(Q2_5['Airport coordinates'])):
string = re.findall(r"d+.?d*", str(Q2_5['Airport coordinates'].iloc[i]))
if string==[]:
longitude.append(np.nan)
latitude.append(np.nan)
else:
longitude.append(float(string[0]))
latitude.append( float(string[1]))
Q2_5['Longitude'] = longitude
Q2_5['Latitude'] = latitude
Q2_5 = Q2_5.dropna()
# 删掉 Not found 的机场

cuts_long = pd.cut(Q2_5['Longitude'],bins=[0, Q2_5['Longitude'].median(), 180])
Q2_5['cuts_long'] = cuts_long
cuts_la = pd.cut(Q2_5['Latitude'],bins=[0, Q2_5['Latitude'].median(), 90])
Q2_5['cuts_la'] = cuts_la
display(Q2_5.groupby(['cuts_long', 'cuts_la']).count())
groupby_5 = Q2_5.groupby(['cuts_long', 'cuts_la'])
print( '2017 年 - 2019 年货运总量最大区域的 经度范围：%s，维度范围：%s' 
% (groupby_5['Whole year'].sum().idxmax()[0], groupby_5['Whole year'].sum().idxmax()[1]) )

		Whole year	Airport coordinates	Longitude	Latitude
cuts_long	cuts_la
(0.0, 82.049]	(0.0, 58.498]	187	187	187	187
(0.0, 82.049]	(58.498, 90.0]	196	196	196	196
(82.049, 180.0]	(0.0, 58.498]	195	195	195	195
(82.049, 180.0]	(58.498, 90.0]	185	185	185	185

2017 年 - 2019 年货运总量最大区域的 经度范围：(0.0, 82.049]，维度范围：(0.0, 58.498]

（6）

问：在统计学中常常用秩代表排名，现在规定某个机场某年某个月的秩为该机场该月在当年所有月份中货运量的排名（例如：xx 机场 19 年 1 月运量在整个 19 年 12 个月中排名第一，则秩为 1），那么判断某月运量情况的相对大小的秩方法为将所有机场在该月的秩排名相加，并将这个量定义为每一个月的秩综合指数，请根据上述定义计算 2016 年 12 个月的秩综合指数。

Q2_6 = Q2[Q2['Year']==2016]
month = Q2_6.columns[2:14]
Rank = pd.DataFrame(index=Q2_6.index, columns=month)
for index in Q2_6.index:
aa = Q2_6.loc[index,month].sort_values(ascending=False)
rank = 1
for mon in aa.index:
Rank.loc[index, mon] = rank
rank = rank + 1
print(Rank.apply(lambda x: x.sum()))

January
3406
February
3076
March
2730
April
2432
May
2276
June
2047
July
1854
August
1527
September
1269
October
1009
November
728
December
422
dtype: int64

三、新冠肺炎在美国的传播

Q3_confirmed = pd.read_csv('数据集/美国确证数.csv')
Q3_death = pd.read_csv('数据集/美国死亡数.csv')
display(Q3_confirmed.head())
display(Q3_death.head())

	UID	iso2	iso3	code3	FIPS	Admin2	Province_State	Country_Region	Lat	Long_	...	2020/4/17	2020/4/18	2020/4/19	2020/4/20	2020/4/21	2020/4/22	2020/4/23	2020/4/24	2020/4/25	2020/4/26
0	84001001	US	USA	840	1001	Autauga	Alabama	US	32.539527	-86.644082	...	26	25	26	28	30	32	33	36	36	37
1	84001003	US	USA	840	1003	Baldwin	Alabama	US	30.727750	-87.722071	...	103	109	112	117	123	132	143	147	147	161
2	84001005	US	USA	840	1005	Barbour	Alabama	US	31.868263	-85.387129	...	15	18	20	22	28	29	30	32	32	33
3	84001007	US	USA	840	1007	Bibb	Alabama	US	32.996421	-87.125115	...	24	26	28	32	32	34	33	34	34	38
4	84001009	US	USA	840	1009	Blount	Alabama	US	33.982109	-86.567906	...	20	20	21	22	26	29	31	31	31	34

5 rows × 107 columns

	UID	iso2	iso3	code3	FIPS	Admin2	Province_State	Country_Region	Lat	Long_	...	2020/4/17	2020/4/18	2020/4/19	2020/4/20	2020/4/21	2020/4/22	2020/4/23	2020/4/24	2020/4/25	2020/4/26
0	84001001	US	USA	840	1001	Autauga	Alabama	US	32.539527	-86.644082	...	2	2	2	1	1	2	2	2	2	2
1	84001003	US	USA	840	1003	Baldwin	Alabama	US	30.727750	-87.722071	...	2	2	2	3	3	3	3	3	3	3
2	84001005	US	USA	840	1005	Barbour	Alabama	US	31.868263	-85.387129	...	0	0	0	0	0	0	0	0	0	0
3	84001007	US	USA	840	1007	Bibb	Alabama	US	32.996421	-87.125115	...	0	0	0	0	0	0	0	0	0	0
4	84001009	US	USA	840	1009	Blount	Alabama	US	33.982109	-86.567906	...	0	0	0	0	0	0	0	0	0	0

5 rows × 108 columns

（1）

问：用 corr() 函数计算县（每行都是一个县）人口与表中最后一天记录日期死亡数的相关系数。

Q3_death[['Population', '2020/4/26']].corr()

	Population	2020/4/26
Population	1.000000	0.403844
2020/4/26	0.403844	1.000000

（2）

问：截止到 4 月 1 日，统计每个州零感染县的比例。

# 时序数据，是累计的，因此只取2020/4/1日
Q3_2 = Q3_confirmed[['Admin2', 'Province_State', '2020/4/1']]
groupby_Q3_2 = Q3_2.groupby('Province_State')
# display(groupby_Q3_2.first())
groupby_Q3_2['2020/4/1'].apply(lambda x: len(x[x==0]) / len(x) )

Province_State
Alabama
0.119403
Alaska
0.793103
Arizona
0.000000
Arkansas
0.293333
California
0.137931
Colorado
0.218750
Connecticut
0.000000
Delaware
0.000000
District of Columbia
0.000000
Florida
0.164179
Georgia
0.125786
Hawaii
0.200000
Idaho
0.386364
Illinois
0.480392
Indiana
0.108696
Iowa
0.404040
Kansas
0.609524
Kentucky
0.441667
Louisiana
0.062500
Maine
0.250000
Maryland
0.041667
Massachusetts
0.142857
Michigan
0.192771
Minnesota
0.367816
Mississippi
0.060976
Missouri
0.391304
Montana
0.625000
Nebraska
0.752688
Nevada
0.470588
New Hampshire
0.100000
New Jersey
0.000000
New Mexico
0.424242
New York
0.080645
North Carolina
0.180000
North Dakota
0.547170
Ohio
0.181818
Oklahoma
0.376623
Oregon
0.277778
Pennsylvania
0.104478
Rhode Island
0.000000
South Carolina
0.065217
South Dakota
0.560606
Tennessee
0.115789
Texas
0.452756
Utah
0.482759
Vermont
0.142857
Virginia
0.270677
Washington
0.128205
West Virginia
0.472727
Wisconsin
0.319444
Wyoming
0.347826
Name: 2020/4/1, dtype: float64

（3）

问：请找出最早出确证病例的三个县。

Q3_3 = Q3_confirmed.set_index('Admin2')
# Q3_3.head()
Date = Q3_3.columns[10:]
top3_adamin = {}
for date in Date:
aa = Q3_3.loc[:, date]
first_index = aa[aa>0].index
for first in first_index:
if first not in top3_adamin:
top3_adamin[first] = date
else:
continue
if len(top3_adamin)>3:
break
for key, value in top3_adamin.items():
print('{value} : {key} 开始出现病例'.format(value = value,key = key))
# pd.DataFrame([top3_adamin])

2020/1/22 : King 开始出现病例
2020/1/24 : Cook 开始出现病例
2020/1/26 : Maricopa 开始出现病例
2020/1/26 : Los Angeles 开始出现病例
2020/1/26 : Orange 开始出现病例

（4）

问：按州统计单日死亡增加数，并给出哪个州在哪一天确诊数增加最大（这里指的是在所有州和所有天两个指标一起算，不是分别算）。

Q3_4_confirmed = Q3_confirmed.set_index('Province_State').iloc[:, 10:].groupby('Province_State').sum()
Q3_4_death = Q3_death.set_index('Province_State').iloc[:, 11:].groupby('Province_State').sum()
Date = Q3_4_death.columns[0:]
death_gain = []
confirmed_gain = []
for date in range(len(Date)-1):
death_gain.append(Q3_4_death.iloc[:, date+1] - Q3_4_death.iloc[:, date])
confirmed_gain.append(Q3_4_confirmed.iloc[:, date+1] - Q3_4_confirmed.iloc[:, date])
death_gain = pd.DataFrame(np.array(death_gain).T, index=Q3_4_death.index, columns=Q3_4_death.columns[1:])
confirmed_gain = pd.DataFrame(np.array(confirmed_gain).T, index=Q3_4_confirmed.index, columns=Q3_4_death.columns[1:])
print('死亡增加数统计结果：')
display(death_gain)
print('各州确诊数增加最大情况：')
confirmed_gain.idxmax(axis=1)

死亡增加数统计结果：

	2020/1/23	2020/1/24	2020/1/25	2020/1/26	2020/1/27	2020/1/28	2020/1/29	2020/1/30	2020/1/31	2020/2/1	...	2020/4/17	2020/4/18	2020/4/19	2020/4/20	2020/4/21	2020/4/22	2020/4/23	2020/4/24	2020/4/25	2020/4/26
Province_State
Alabama	0	0	0	0	0	0	0	0	0	0	...	15	5	4	6	20	13	6	7	0	10
Alaska	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
Arizona	0	0	0	0	0	0	0	0	0	0	...	5	39	3	7	18	23	18	17	7	2
Arkansas	0	0	0	0	0	0	0	0	0	0	...	0	1	1	2	1	0	3	2	1	3
California	0	0	0	0	0	0	0	0	0	0	...	81	103	35	48	57	139	112	88	68	36
Colorado	0	0	0	0	0	0	0	0	0	0	...	17	17	31	0	63	23	44	121	10	4
Connecticut	0	0	0	0	0	0	0	0	0	0	...	65	49	40	204	92	121	93	128	98	65
Delaware	0	0	0	0	0	0	0	0	0	0	...	9	6	0	5	10	7	3	8	12	8
District of Columbia	0	0	0	0	0	0	0	0	0	0	...	5	5	5	9	7	15	12	14	12	13
Florida	0	0	0	0	0	0	0	0	0	0	...	57	23	26	48	45	26	94	59	9	20
Georgia	0	0	0	0	0	0	0	0	0	0	...	56	23	14	86	25	48	32	18	7	11
Hawaii	0	0	0	0	0	0	0	0	0	0	...	0	0	1	0	0	2	0	1	0	1
Idaho	0	0	0	0	0	0	0	0	0	0	...	0	2	1	1	3	3	3	0	2	0
Illinois	0	0	0	0	0	0	0	0	0	0	...	60	125	31	59	120	98	124	106	80	59
Indiana	0	0	0	0	0	0	0	0	0	0	...	46	23	17	15	58	31	40	35	45	27
Iowa	0	0	0	0	0	0	0	0	0	0	...	4	10	1	4	4	7	6	11	5	6
Kansas	0	0	0	0	0	0	0	0	0	0	...	2	3	8	9	7	3	1	5	2	1
Kentucky	0	0	0	0	0	0	0	0	0	0	...	7	2	2	14	17	14	6	9	5	3
Louisiana	0	0	0	0	0	0	0	0	0	0	...	57	54	29	32	76	69	68	61	43	26
Maine	0	0	0	0	0	0	0	0	0	0	...	2	3	2	1	1	3	5	3	0	3
Maryland	0	0	0	0	0	0	0	0	0	0	...	15	87	40	29	42	58	52	45	64	14
Massachusetts	0	0	0	0	0	0	0	0	0	0	...	135	156	301	0	267	0	0	0	764	41
Michigan	0	0	0	0	0	0	0	0	0	0	...	229	80	82	75	106	238	162	106	67	158
Minnesota	0	0	0	0	0	0	0	0	0	0	...	24	10	13	9	17	19	21	21	23	28
Mississippi	0	0	0	0	0	0	0	0	0	0	...	11	12	7	10	14	10	8	8	12	6
Missouri	0	0	0	0	0	0	0	0	0	0	...	14	13	2	1	24	14	13	19	9	2
Montana	0	0	0	0	0	0	0	0	0	0	...	1	2	0	0	2	2	0	0	0	0
Nebraska	0	0	0	0	0	0	0	0	0	0	...	0	0	3	0	0	0	20	0	0	0
Nevada	0	0	0	0	0	0	0	0	0	0	...	10	10	7	1	4	9	18	5	10	0
New Hampshire	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0
New Jersey	0	0	0	0	0	0	0	0	0	0	...	322	231	294	134	24	630	276	257	231	24
New Mexico	0	0	0	0	0	0	0	0	0	0	...	8	9	2	3	0	13	7	6	9	6
New York	0	0	0	0	0	0	0	0	0	0	...	452	520	480	947	615	460	540	212	1054	260
North Carolina	0	0	0	0	0	0	0	0	0	0	...	27	10	13	18	27	20	16	13	11	19
North Dakota	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	4	1	1	0	1	1
Ohio	0	0	0	0	0	0	0	0	0	0	...	27	33	20	38	48	53	46	34	21	17
Oklahoma	0	0	0	0	0	0	0	0	0	0	...	0	0	9	3	21	6	9	9	6	1
Oregon	0	0	0	0	0	0	0	0	0	0	...	6	2	2	1	3	0	5	3	1	5
Pennsylvania	0	0	0	0	0	0	0	0	0	0	...	80	121	234	72	266	59	51	23	46	23
Rhode Island	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	-3	0	0	0
South Carolina	0	0	0	0	0	0	0	0	0	0	...	5	3	1	3	1	16	10	7	9	8
South Dakota	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	1	1	0	1	0	1
Tennessee	0	0	0	0	0	0	0	0	0	0	...	5	1	6	4	5	0	14	-2	8	3
Texas	0	0	0	0	0	0	0	0	0	0	...	37	25	14	17	26	23	48	20	14	13
Utah	0	0	0	0	0	0	0	0	0	0	...	10	2	0	0	3	0	3	4	1	0
Vermont	0	0	0	0	0	0	0	0	0	0	...	2	2	0	1	2	0	3	1	2	0
Virginia	0	0	0	0	0	0	0	0	0	0	...	33	0	0	20	0	165	24	38	26	12
Washington	0	0	0	0	0	0	0	0	0	0	...	24	10	23	7	21	22	25	11	15	12
West Virginia	0	0	0	0	0	0	0	0	0	0	...	1	0	0	0	0	0	24	1	0	2
Wisconsin	0	0	0	0	0	0	0	0	0	0	...	9	6	8	10	14	4	10	5	3	6
Wyoming	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

51 rows × 95 columns

各州确诊数增加最大情况：
Province_State
Alabama
2020/4/26
Alaska
2020/3/28
Arizona
2020/4/23
Arkansas
2020/4/23
California
2020/4/20
Colorado
2020/4/24
Connecticut
2020/4/22
Delaware
2020/4/26
District of Columbia
2020/4/8
Florida
2020/4/2
Georgia
2020/4/14
Hawaii
2020/4/3
Idaho
2020/4/2
Illinois
2020/4/24
Indiana
2020/4/25
Iowa
2020/4/25
Kansas
2020/4/23
Kentucky
2020/4/12
Louisiana
2020/4/2
Maine
2020/4/2
Maryland
2020/4/8
Massachusetts
2020/4/24
Michigan
2020/4/3
Minnesota
2020/4/17
Mississippi
2020/4/19
Missouri
2020/4/10
Montana
2020/4/2
Nebraska
2020/4/23
Nevada
2020/3/30
New Hampshire
2020/4/15
New Jersey
2020/4/22
New Mexico
2020/4/22
New York
2020/4/15
North Carolina
2020/4/25
North Dakota
2020/4/18
Ohio
2020/4/19
Oklahoma
2020/4/19
Oregon
2020/4/5
Pennsylvania
2020/4/23
Rhode Island
2020/4/15
South Carolina
2020/4/9
South Dakota
2020/4/15
Tennessee
2020/4/23
Texas
2020/4/9
Utah
2020/4/10
Vermont
2020/4/11
Virginia
2020/4/25
Washington
2020/4/2
West Virginia
2020/4/19
Wisconsin
2020/4/25
Wyoming
2020/4/21
dtype: object

（5）

问：**现需对每个州编制确证与死亡表，第一列为时间，并且起始时间为该州开始出现死亡比例的那一天，第二列和第三列分别为确证数和死亡数，每个州需要保存为一个单独的 csv 文件，文件名为“州名.csv” **

Q3_5_confirmed = Q3_confirmed.set_index('Province_State').iloc[:, 10:].groupby('Province_State').sum().T
Q3_5_death = Q3_death.set_index('Province_State').iloc[:, 11:].groupby('Province_State').sum().T
zhou = Q3_5_confirmed.columns[:]
for name in list(zhou):
confirmed = Q3_5_confirmed.loc[:, name]
death = Q3_5_death.loc[:, name]
for date in Q3_5_confirmed.index:
if death.loc[date] > 0:
start_day = date
break
save_data = pd.DataFrame( {'confirmed num':confirmed.loc[start_day:], 'death num':death.loc[start_day:]} )
save_data.to_csv(name + '.csv')

（6）

问：**现需对 4 月 1 日至 4 月 10 日编制新增确证数与新增死亡数表，第一列为州名，第二列和第三列分别为新增确证数和新增死亡数，分别保存为十个单独的 csv 文件，文件名为“日期.csv” **

Q3_6_death_gain = death_gain.loc[:, '2020/4/1':'2020/4/10']
Q3_6_confirm_gain = confirmed_gain.loc[:, '2020/4/1':'2020/4/10']
# display(Q3_6_death_gain)
# display(Q3_6_confirm_gain)
Date = Q3_6_death_gain.columns[0:]
# pd.DataFrame({'add_confirm_num':Q3_6_confirm_gain.loc[:, date], 'add_death_num':Q3_6_death_gain.loc[:, date]})
for date in list(Date):
save_data = pd.DataFrame({'add_confirm_num':Q3_6_confirm_gain.loc[:, date], 'add_death_num':Q3_6_death_gain.loc[:, date]})
save_data.to_csv(date.replace('/', '-') + '.csv')

如果您觉得文章对您有用，请给我点个赞吧！
您的肯定是对我最大的鼓励。

最后

以上就是敏感麦片最近收集整理的关于Pandas组队学习(上)——综合练习题答案的全部内容，更多相关Pandas组队学习(上)——综合练习题答案内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：Pandas学习笔记
浏览次数：206 次浏览
发布日期：2024-01-22 14:45:42
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_23_okfz_12__7__18_w.html

Pandas组队学习(上)——综合练习题答案

题目快速跳转：

一、2002 年-2018 年上海机动车拍照拍卖

（1）

（2）

（3）

（4）

（5）

（6）

二、2007 年-2019 年俄罗斯机场货运航班运载量

（1）

（2）

（3）

（4）

（5）

（6）

三、新冠肺炎在美国的传播

（1）

（2）

（3）

（4）

（5）

（6）

最后

评论列表共有 0 条评论

发表评论取消回复

Pandas组队学习(上)——综合练习题答案

题目快速跳转：

一、2002 年-2018 年上海机动车拍照拍卖

（1）

（2）

（3）

（4）

（5）

（6）

二、2007 年-2019 年俄罗斯机场货运航班运载量

（1）

（2）

（3）

（4）

（5）

（6）

三、新冠肺炎在美国的传播

（1）

（2）

（3）

（4）

（5）

（6）

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复