概述
pandas 读取文件
文档参考《深入浅出 Pandas 利用Python 进行数据处理与分析》,还有pandas 官方文档,https://www.pypandas.cn/docs/
https://pandas.pydata.org/
https://www.gairuo.com/p/pandas
1 读取的 csv 文件
import numpy as np
import pandas as pd
分类
格式 | 文件格式 | 读取函数 | 写入(输出)函数 | |
---|---|---|---|---|
0 | binary | Excel | read_excel | to_excel |
1 | text | CSV | read_csv read_table | to_csv |
2 | text | JSON | read_json | to_json |
3 | text | 网页表格 HTML | read_html | to_html |
4 | text | 剪贴板 | read_clipboard | to_clipboard |
5 | SQL | SQL | read_sql | to_sql |
6 | XML | read_xml | NaN | read_xml |
7 | text | Markdown | NaN | to_markdown |
读取本地相对路径
df = pd.read_csv('tmp.csv')
df.head()
name | team | Q1 | Q2 | Q3 | Q4 | |
---|---|---|---|---|---|---|
0 | Liver | E | 89 | 21 | 24 | 64 |
1 | Arry | C | 36 | 37 | 37 | 57 |
2 | Ack | A | 57 | 60 | 18 | 84 |
3 | Eorge | C | 93 | 96 | 71 | 78 |
4 | Oah | D | 65 | 49 | 61 | 86 |
读取本地绝对路径
df2 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/tmp.csv')
df2.head()
name | team | Q1 | Q2 | Q3 | Q4 | |
---|---|---|---|---|---|---|
0 | Liver | E | 89 | 21 | 24 | 64 |
1 | Arry | C | 36 | 37 | 37 | 57 |
2 | Ack | A | 57 | 60 | 18 | 84 |
3 | Eorge | C | 93 | 96 | 71 | 78 |
4 | Oah | D | 65 | 49 | 61 | 86 |
读取网络文档
# pd.read_csv('data/my/my.data')
df = pd.read_csv('https://www.gairuo.com/file/data/dataset/GDP-China.csv')
df.head()
年份 | 国民总收入 | 国内生产总值 | 第一产业增加值 | 第二产业增加值 | 第三产业增加值 | 人均国内生产总值 | |
---|---|---|---|---|---|---|---|
0 | 2018 | 896915.6 | 900309.5 | 64734.0 | 366000.9 | 469574.6 | 64644 |
1 | 2017 | 820099.5 | 820754.3 | 62099.5 | 332742.7 | 425912.1 | 59201 |
2 | 2016 | 737074.0 | 740060.8 | 60139.2 | 296547.7 | 383373.9 | 53680 |
3 | 2015 | 683390.5 | 685992.9 | 57774.6 | 282040.3 | 346178.0 | 50028 |
4 | 2014 | 642097.6 | 641280.6 | 55626.3 | 277571.8 | 308082.5 | 47005 |
分隔符
df2 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team3.csv')
df2.head()
nametteamtQ1tQ2tQ3tQ4 | |
---|---|
0 | LivertEt89t21t24t64 |
1 | ArrytCt36t37t37t57 |
2 | AcktAt57t60t18t84 |
3 | EorgetCt93t96t71t78 |
4 | OahtDt65t49t61t86 |
df3 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team3.csv',sep = 't')
df3.head()
name | team | Q1 | Q2 | Q3 | Q4 | |
---|---|---|---|---|---|---|
0 | Liver | E | 89 | 21 | 24 | 64 |
1 | Arry | C | 36 | 37 | 37 | 57 |
2 | Ack | A | 57 | 60 | 18 | 84 |
3 | Eorge | C | 93 | 96 | 71 | 78 |
4 | Oah | D | 65 | 49 | 61 | 86 |
表头
df6 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv')
df6.head()
name | team | Q1 | Q2 | Q3 | Q4 | |
---|---|---|---|---|---|---|
0 | Liver | E | 89 | 21 | 24 | 64 |
1 | Arry | C | 36 | 37 | 37 | 57 |
2 | Ack | A | 57 | 60 | 18 | 84 |
3 | Eorge | C | 93 | 96 | 71 | 78 |
4 | Oah | D | 65 | 49 | 61 | 86 |
df7 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',header = 1)
df7.head()
Liver | E | 89 | 21 | 24 | 64 | |
---|---|---|---|---|---|---|
0 | Arry | C | 36 | 37 | 37 | 57 |
1 | Ack | A | 57 | 60 | 18 | 84 |
2 | Eorge | C | 93 | 96 | 71 | 78 |
3 | Oah | D | 65 | 49 | 61 | 86 |
4 | Harlie | C | 24 | 13 | 87 | 43 |
df8 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',header = [0,1,3])
df8.head()
name | team | Q1 | Q2 | Q3 | Q4 | |
---|---|---|---|---|---|---|
Liver | E | 89 | 21 | 24 | 64 | |
Ack | A | 57 | 60 | 18 | 84 | |
0 | Eorge | C | 93 | 96 | 71 | 78 |
1 | Oah | D | 65 | 49 | 61 | 86 |
2 | Harlie | C | 24 | 13 | 87 | 43 |
3 | Acob | B | 61 | 95 | 94 | 8 |
4 | Lfie | A | 9 | 10 | 99 | 37 |
列名
注意与 col 参数的区别
df9 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',names = ['name','team'])
df9.head()
name | team | ||||
---|---|---|---|---|---|
name | team | Q1 | Q2 | Q3 | Q4 |
Liver | E | 89 | 21 | 24 | 64 |
Arry | C | 36 | 37 | 37 | 57 |
Ack | A | 57 | 60 | 18 | 84 |
Eorge | C | 93 | 96 | 71 | 78 |
如果文件不包含列名,那么应该设置header=None,列名列表中 不允许有重复值。
df9 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',names = ['name','team'],header = None)
df9.head()
name | team | ||||
---|---|---|---|---|---|
name | team | Q1 | Q2 | Q3 | Q4 |
Liver | E | 89 | 21 | 24 | 64 |
Arry | C | 36 | 37 | 37 | 57 |
Ack | A | 57 | 60 | 18 | 84 |
Eorge | C | 93 | 96 | 71 | 78 |
df10 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',index_col = False)
df10.head()
name | team | Q1 | Q2 | Q3 | Q4 | |
---|---|---|---|---|---|---|
0 | Liver | E | 89 | 21 | 24 | 64 |
1 | Arry | C | 36 | 37 | 37 | 57 |
2 | Ack | A | 57 | 60 | 18 | 84 |
3 | Eorge | C | 93 | 96 | 71 | 78 |
4 | Oah | D | 65 | 49 | 61 | 86 |
df10 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',index_col = 0)
df10.head()
team | Q1 | Q2 | Q3 | Q4 | |
---|---|---|---|---|---|
name | |||||
Liver | E | 89 | 21 | 24 | 64 |
Arry | C | 36 | 37 | 37 | 57 |
Ack | A | 57 | 60 | 18 | 84 |
Eorge | C | 93 | 96 | 71 | 78 |
Oah | D | 65 | 49 | 61 | 86 |
df10 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',index_col = ['name','team'])
df10.head()
Q1 | Q2 | Q3 | Q4 | ||
---|---|---|---|---|---|
name | team | ||||
Liver | E | 89 | 21 | 24 | 64 |
Arry | C | 36 | 37 | 37 | 57 |
Ack | A | 57 | 60 | 18 | 84 |
Eorge | C | 93 | 96 | 71 | 78 |
Oah | D | 65 | 49 | 61 | 86 |
df10 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',index_col = [0,1])
df10.head()
Q1 | Q2 | Q3 | Q4 | ||
---|---|---|---|---|---|
name | team | ||||
Liver | E | 89 | 21 | 24 | 64 |
Arry | C | 36 | 37 | 37 | 57 |
Ack | A | 57 | 60 | 18 | 84 |
Eorge | C | 93 | 96 | 71 | 78 |
Oah | D | 65 | 49 | 61 | 86 |
使用部分列
df11 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',usecols = [0,1])
df11.head()
name | team | |
---|---|---|
0 | Liver | E |
1 | Arry | C |
2 | Ack | A |
3 | Eorge | C |
4 | Oah | D |
df12 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',usecols = ['name','Q1'])
df12.head()
name | Q1 | |
---|---|---|
0 | Liver | 89 |
1 | Arry | 36 |
2 | Ack | 57 |
3 | Eorge | 93 |
4 | Oah | 65 |
指定列顺序,其实是df的筛选功能
df12 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',usecols = ['name','Q1'])[['Q1','name']]
df12.head()
Q1 | name | |
---|---|---|
0 | 89 | Liver |
1 | 36 | Arry |
2 | 57 | Ack |
3 | 93 | Eorge |
4 | 65 | Oah |
返回序列
将squeeze设置为True,如果文件只包含一列,则返回一个Series, 如果有多列,则还是返回DataFrame。
df13 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',usecols = ['name','Q1'],squeeze = True)
df13.head()
name | Q1 | |
---|---|---|
0 | Liver | 89 |
1 | Arry | 36 |
2 | Ack | 57 |
3 | Eorge | 93 |
4 | Oah | 65 |
df14 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',usecols = ['name'],squeeze = True)
df14.head()
表头前缀
如果原始数据没有列名,可以指定一个前缀加序数的名称
df14 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',prefix = 'c_',header = None)
df14.head()
c_0 | c_1 | c_2 | c_3 | c_4 | c_5 | |
---|---|---|---|---|---|---|
0 | name | team | Q1 | Q2 | Q3 | Q4 |
1 | Liver | E | 89 | 21 | 24 | 64 |
2 | Arry | C | 36 | 37 | 37 | 57 |
3 | Ack | A | 57 | 60 | 18 | 84 |
4 | Eorge | C | 93 | 96 | 71 | 78 |
数据类型
df15 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv')
df15.dtypes
name object
team object
Q1 int64
Q2 int64
Q3 int64
Q4 int64
dtype: object
df16 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',dtype = {'Q1':np.float64})
df16.dtypes
name object
team object
Q1 float64
Q2 int64
Q3 int64
Q4 int64
dtype: object
df16.head()
name | team | Q1 | Q2 | Q3 | Q4 | |
---|---|---|---|---|---|---|
0 | Liver | E | 89.0 | 21 | 24 | 64 |
1 | Arry | C | 36.0 | 37 | 37 | 57 |
2 | Ack | A | 57.0 | 60 | 18 | 84 |
3 | Eorge | C | 93.0 | 96 | 71 | 78 |
4 | Oah | D | 65.0 | 49 | 61 | 86 |
df16 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',usecols = [2,3,4,5],dtype = np.float64)
df16.head()
Q1 | Q2 | Q3 | Q4 | |
---|---|---|---|---|
0 | 89.0 | 21.0 | 24.0 | 64.0 |
1 | 36.0 | 37.0 | 37.0 | 57.0 |
2 | 57.0 | 60.0 | 18.0 | 84.0 |
3 | 93.0 | 96.0 | 71.0 | 78.0 |
4 | 65.0 | 49.0 | 61.0 | 86.0 |
df16.dtypes
Q1 float64
Q2 float64
Q3 float64
Q4 float64
dtype: object
读取指定行
pd.read_csv(data, nrows = 1000)
类似列表的序列或者可调用对象
跳过前三行 pd.read_csv(data, skiprows=2)
跳过前三行 pd.read_csv(data, skiprows=range(2))
跳过指定行 pd.read_csv(data, skiprows=[24,234,141])
跳过指定行 pd.read_csv(data, skiprows=np.array([2, 6, 11]))
隔行跳过 pd.read_csv(data, skiprows=lambda x: x % 2 != 0)
Excel
url = '/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team.xlsx'
df2 = pd.read_excel(url)
df2.head()
name | team | Q1 | Q2 | Q3 | Q4 | |
---|---|---|---|---|---|---|
0 | Liver | E | 89 | 21 | 24 | 64 |
1 | Arry | C | 36 | 37 | 37 | 57 |
2 | Ack | A | 57 | 60 | 18 | 84 |
3 | Eorge | C | 93 | 96 | 71 | 78 |
4 | Oah | D | 65 | 49 | 61 | 86 |
url2 = '/Users/xinmin/DataAnalysis/1.深入浅出Pandas/tmp.xlsx'
df3 = pd.read_excel(url2)
df3.head()
name | team | Q1 | Q2 | Q3 | Q4 | |
---|---|---|---|---|---|---|
0 | Liver | E | 89 | 21 | 24 | 64 |
1 | Arry | C | 36 | 37 | 37 | 57 |
2 | Ack | A | 57 | 60 | 18 | 84 |
3 | Eorge | C | 93 | 96 | 71 | 78 |
4 | Oah | D | 65 | 49 | 61 | 86 |
df3 = pd.read_excel(url2,sheet_name = 1)
df3.head()
Liver | E | 89 | 21 | 24 | 64 | |
---|---|---|---|---|---|---|
0 | Arry | C | 36 | 37 | 37 | 57 |
1 | Ack | A | 57 | 60 | 18 | 84 |
2 | Eorge | C | 93 | 96 | 71 | 78 |
3 | Oah | D | 65 | 49 | 61 | 86 |
4 | Harlie | C | 24 | 13 | 87 | 43 |
df4 = pd.read_excel(url2,sheet_name ='Sheet2')
df4.tail()
Liver | E | 89 | 21 | 24 | 64 | |
---|---|---|---|---|---|---|
16 | Henry | A | 91 | 15 | 75 | 17 |
17 | William | C | 80 | 68 | 3 | 26 |
18 | Max | E | 97 | 75 | 41 | 3 |
19 | Lucas | A | 60 | 41 | 77 | 62 |
20 | Ethan | D | 79 | 45 | 89 | 88 |
df5 = pd.read_excel(url2,sheet_name = [0,1,'Sheet2'])
df5
{0: name team Q1 Q2 Q3 Q4
0 Liver E 89 21 24 64
1 Arry C 36 37 37 57
2 Ack A 57 60 18 84
3 Eorge C 93 96 71 78
4 Oah D 65 49 61 86
.. ... ... .. .. .. ..
95 Gabriel C 48 59 87 74
96 Austin7 C 21 31 30 43
97 Lincoln4 C 98 93 1 20
98 Eli E 11 74 58 91
99 Ben E 21 43 41 74
[100 rows x 6 columns],
1: Liver E 89 21 24 64
0 Arry C 36 37 37 57
1 Ack A 57 60 18 84
2 Eorge C 93 96 71 78
3 Oah D 65 49 61 86
4 Harlie C 24 13 87 43
5 Acob B 61 95 94 8
6 Lfie A 9 10 99 37
7 Reddie D 64 93 57 72
8 Oscar A 77 9 26 67
9 Leo B 17 4 33 79
10 Logan B 9 89 35 65
11 Archie C 83 89 59 68
12 Theo C 51 86 87 27
13 Thomas B 80 48 56 41
14 James E 48 77 52 11
15 Joshua A 63 4 80 30
16 Henry A 91 15 75 17
17 William C 80 68 3 26
18 Max E 97 75 41 3
19 Lucas A 60 41 77 62
20 Ethan D 79 45 89 88,
'Sheet2': Liver E 89 21 24 64
0 Arry C 36 37 37 57
1 Ack A 57 60 18 84
2 Eorge C 93 96 71 78
3 Oah D 65 49 61 86
4 Harlie C 24 13 87 43
5 Acob B 61 95 94 8
6 Lfie A 9 10 99 37
7 Reddie D 64 93 57 72
8 Oscar A 77 9 26 67
9 Leo B 17 4 33 79
10 Logan B 9 89 35 65
11 Archie C 83 89 59 68
12 Theo C 51 86 87 27
13 Thomas B 80 48 56 41
14 James E 48 77 52 11
15 Joshua A 63 4 80 30
16 Henry A 91 15 75 17
17 William C 80 68 3 26
18 Max E 97 75 41 3
19 Lucas A 60 41 77 62
20 Ethan D 79 45 89 88}
df6 = pd.read_excel(url2,header = None)
df6.head()
0 | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|
0 | name | team | Q1 | Q2 | Q3 | Q4 |
1 | Liver | E | 89 | 21 | 24 | 64 |
2 | Arry | C | 36 | 37 | 37 | 57 |
3 | Ack | A | 57 | 60 | 18 | 84 |
4 | Eorge | C | 93 | 96 | 71 | 78 |
df7 = pd.read_excel(url2,header = None,sheet_name = 'Sheet2')
df7.head()
0 | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|
0 | Liver | E | 89 | 21 | 24 | 64 |
1 | Arry | C | 36 | 37 | 37 | 57 |
2 | Ack | A | 57 | 60 | 18 | 84 |
3 | Eorge | C | 93 | 96 | 71 | 78 |
4 | Oah | D | 65 | 49 | 61 | 86 |
df8 = pd.read_excel(url2,header = [0,1])
df8
name | team | Q1 | Q2 | Q3 | Q4 | |
---|---|---|---|---|---|---|
Liver | E | 89 | 21 | 24 | 64 | |
0 | Arry | C | 36 | 37 | 37 | 57 |
1 | Ack | A | 57 | 60 | 18 | 84 |
2 | Eorge | C | 93 | 96 | 71 | 78 |
3 | Oah | D | 65 | 49 | 61 | 86 |
4 | Harlie | C | 24 | 13 | 87 | 43 |
... | ... | ... | ... | ... | ... | ... |
94 | Gabriel | C | 48 | 59 | 87 | 74 |
95 | Austin7 | C | 21 | 31 | 30 | 43 |
96 | Lincoln4 | C | 98 | 93 | 1 | 20 |
97 | Eli | E | 11 | 74 | 58 | 91 |
98 | Ben | E | 21 | 43 | 41 | 74 |
99 rows × 6 columns
数据输出
输出Excel 文件path_to_excel.xlsx,sheet_name 是tmp ,index = False 不设置索引,header = None 是不设置表头
df18.to_excel(‘path_to_excel.xlsx’,sheet_name = ‘tmp’,index = False,header = None)
将多个df分不同sheet导入一个Excel文件中
with pd.ExcelWriter(‘path_to_file.xlsx’) as writer:
df1.to_excel(writer, sheet_name=‘Sheet1’)
df2.to_excel(writer, sheet_name=‘Sheet2’)
最后
以上就是儒雅手机为你收集整理的pandas 读取文件的全部内容,希望文章能够帮你解决pandas 读取文件所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复