我是靠谱客的博主 儒雅手机,最近开发中收集的这篇文章主要介绍pandas 读取文件,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

pandas 读取文件

文档参考《深入浅出 Pandas 利用Python 进行数据处理与分析》,还有pandas 官方文档,https://www.pypandas.cn/docs/
https://pandas.pydata.org/
https://www.gairuo.com/p/pandas

1 读取的 csv 文件
import numpy as np
import pandas as pd

分类

格式文件格式读取函数写入(输出)函数
0binaryExcelread_excelto_excel
1textCSVread_csv read_tableto_csv
2textJSONread_jsonto_json
3text网页表格 HTMLread_htmlto_html
4text剪贴板read_clipboardto_clipboard
5SQLSQLread_sqlto_sql
6XMLread_xmlNaNread_xml
7textMarkdownNaNto_markdown

读取本地相对路径

df = pd.read_csv('tmp.csv')
df.head()
nameteamQ1Q2Q3Q4
0LiverE89212464
1ArryC36373757
2AckA57601884
3EorgeC93967178
4OahD65496186

读取本地绝对路径

df2 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/tmp.csv')
df2.head()
nameteamQ1Q2Q3Q4
0LiverE89212464
1ArryC36373757
2AckA57601884
3EorgeC93967178
4OahD65496186

读取网络文档

# pd.read_csv('data/my/my.data')
df = pd.read_csv('https://www.gairuo.com/file/data/dataset/GDP-China.csv')
df.head()
年份国民总收入国内生产总值第一产业增加值第二产业增加值第三产业增加值人均国内生产总值
02018896915.6900309.564734.0366000.9469574.664644
12017820099.5820754.362099.5332742.7425912.159201
22016737074.0740060.860139.2296547.7383373.953680
32015683390.5685992.957774.6282040.3346178.050028
42014642097.6641280.655626.3277571.8308082.547005

分隔符

df2 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team3.csv')
df2.head()
nametteamtQ1tQ2tQ3tQ4
0LivertEt89t21t24t64
1ArrytCt36t37t37t57
2AcktAt57t60t18t84
3EorgetCt93t96t71t78
4OahtDt65t49t61t86
df3 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team3.csv',sep = 't')
df3.head()
nameteamQ1Q2Q3Q4
0LiverE89212464
1ArryC36373757
2AckA57601884
3EorgeC93967178
4OahD65496186

表头

df6 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv')
df6.head()
nameteamQ1Q2Q3Q4
0LiverE89212464
1ArryC36373757
2AckA57601884
3EorgeC93967178
4OahD65496186
df7 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',header = 1)
df7.head()
LiverE89212464
0ArryC36373757
1AckA57601884
2EorgeC93967178
3OahD65496186
4HarlieC24138743
df8 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',header = [0,1,3])
df8.head()
nameteamQ1Q2Q3Q4
LiverE89212464
AckA57601884
0EorgeC93967178
1OahD65496186
2HarlieC24138743
3AcobB6195948
4LfieA9109937

列名

注意与 col 参数的区别

df9 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',names = ['name','team'])
df9.head()
nameteam
nameteamQ1Q2Q3Q4
LiverE89212464
ArryC36373757
AckA57601884
EorgeC93967178

如果文件不包含列名,那么应该设置header=None,列名列表中 不允许有重复值。

df9 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',names = ['name','team'],header = None)
df9.head()
nameteam
nameteamQ1Q2Q3Q4
LiverE89212464
ArryC36373757
AckA57601884
EorgeC93967178
df10 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',index_col = False)
df10.head()
nameteamQ1Q2Q3Q4
0LiverE89212464
1ArryC36373757
2AckA57601884
3EorgeC93967178
4OahD65496186
df10 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',index_col = 0)
df10.head()
teamQ1Q2Q3Q4
name
LiverE89212464
ArryC36373757
AckA57601884
EorgeC93967178
OahD65496186
df10 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',index_col = ['name','team'])
df10.head()
Q1Q2Q3Q4
nameteam
LiverE89212464
ArryC36373757
AckA57601884
EorgeC93967178
OahD65496186
df10 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',index_col = [0,1])
df10.head()
Q1Q2Q3Q4
nameteam
LiverE89212464
ArryC36373757
AckA57601884
EorgeC93967178
OahD65496186

使用部分列

df11 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',usecols = [0,1])
df11.head()
nameteam
0LiverE
1ArryC
2AckA
3EorgeC
4OahD
df12 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',usecols = ['name','Q1'])
df12.head()
nameQ1
0Liver89
1Arry36
2Ack57
3Eorge93
4Oah65

指定列顺序,其实是df的筛选功能

df12 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',usecols = ['name','Q1'])[['Q1','name']]
df12.head()
Q1name
089Liver
136Arry
257Ack
393Eorge
465Oah

返回序列

将squeeze设置为True,如果文件只包含一列,则返回一个Series, 如果有多列,则还是返回DataFrame。

df13 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',usecols = ['name','Q1'],squeeze = True)
df13.head()
nameQ1
0Liver89
1Arry36
2Ack57
3Eorge93
4Oah65
df14 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',usecols = ['name'],squeeze = True)
df14.head()

表头前缀

如果原始数据没有列名,可以指定一个前缀加序数的名称

df14 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',prefix = 'c_',header = None)
df14.head()
c_0c_1c_2c_3c_4c_5
0nameteamQ1Q2Q3Q4
1LiverE89212464
2ArryC36373757
3AckA57601884
4EorgeC93967178

数据类型

df15 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv')
df15.dtypes
name    object
team    object
Q1       int64
Q2       int64
Q3       int64
Q4       int64
dtype: object
df16 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',dtype = {'Q1':np.float64})
df16.dtypes
name     object
team     object
Q1      float64
Q2        int64
Q3        int64
Q4        int64
dtype: object
df16.head()
nameteamQ1Q2Q3Q4
0LiverE89.0212464
1ArryC36.0373757
2AckA57.0601884
3EorgeC93.0967178
4OahD65.0496186
df16 = pd.read_csv('/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team2.csv',usecols = [2,3,4,5],dtype = np.float64)
df16.head()
Q1Q2Q3Q4
089.021.024.064.0
136.037.037.057.0
257.060.018.084.0
393.096.071.078.0
465.049.061.086.0
df16.dtypes
Q1    float64
Q2    float64
Q3    float64
Q4    float64
dtype: object

读取指定行

pd.read_csv(data, nrows = 1000)

类似列表的序列或者可调用对象

跳过前三行 pd.read_csv(data, skiprows=2)

跳过前三行 pd.read_csv(data, skiprows=range(2))

跳过指定行 pd.read_csv(data, skiprows=[24,234,141])

跳过指定行 pd.read_csv(data, skiprows=np.array([2, 6, 11]))

隔行跳过 pd.read_csv(data, skiprows=lambda x: x % 2 != 0)

Excel

url = '/Users/xinmin/DataAnalysis/1.深入浅出Pandas/team.xlsx'
df2 = pd.read_excel(url)
df2.head()
nameteamQ1Q2Q3Q4
0LiverE89212464
1ArryC36373757
2AckA57601884
3EorgeC93967178
4OahD65496186
url2 = '/Users/xinmin/DataAnalysis/1.深入浅出Pandas/tmp.xlsx'
df3 = pd.read_excel(url2)
df3.head()
nameteamQ1Q2Q3Q4
0LiverE89212464
1ArryC36373757
2AckA57601884
3EorgeC93967178
4OahD65496186
df3 = pd.read_excel(url2,sheet_name = 1)
df3.head()
LiverE89212464
0ArryC36373757
1AckA57601884
2EorgeC93967178
3OahD65496186
4HarlieC24138743
df4 = pd.read_excel(url2,sheet_name ='Sheet2')
df4.tail()
LiverE89212464
16HenryA91157517
17WilliamC8068326
18MaxE9775413
19LucasA60417762
20EthanD79458988
df5 = pd.read_excel(url2,sheet_name = [0,1,'Sheet2'])
df5
{0:         name team  Q1  Q2  Q3  Q4
 0      Liver    E  89  21  24  64
 1       Arry    C  36  37  37  57
 2        Ack    A  57  60  18  84
 3      Eorge    C  93  96  71  78
 4        Oah    D  65  49  61  86
 ..       ...  ...  ..  ..  ..  ..
 95   Gabriel    C  48  59  87  74
 96   Austin7    C  21  31  30  43
 97  Lincoln4    C  98  93   1  20
 98       Eli    E  11  74  58  91
 99       Ben    E  21  43  41  74
 
 [100 rows x 6 columns],
 1:       Liver  E  89  21  24  64
 0      Arry  C  36  37  37  57
 1       Ack  A  57  60  18  84
 2     Eorge  C  93  96  71  78
 3       Oah  D  65  49  61  86
 4    Harlie  C  24  13  87  43
 5      Acob  B  61  95  94   8
 6      Lfie  A   9  10  99  37
 7    Reddie  D  64  93  57  72
 8     Oscar  A  77   9  26  67
 9       Leo  B  17   4  33  79
 10    Logan  B   9  89  35  65
 11   Archie  C  83  89  59  68
 12     Theo  C  51  86  87  27
 13   Thomas  B  80  48  56  41
 14    James  E  48  77  52  11
 15   Joshua  A  63   4  80  30
 16    Henry  A  91  15  75  17
 17  William  C  80  68   3  26
 18      Max  E  97  75  41   3
 19    Lucas  A  60  41  77  62
 20    Ethan  D  79  45  89  88,
 'Sheet2':       Liver  E  89  21  24  64
 0      Arry  C  36  37  37  57
 1       Ack  A  57  60  18  84
 2     Eorge  C  93  96  71  78
 3       Oah  D  65  49  61  86
 4    Harlie  C  24  13  87  43
 5      Acob  B  61  95  94   8
 6      Lfie  A   9  10  99  37
 7    Reddie  D  64  93  57  72
 8     Oscar  A  77   9  26  67
 9       Leo  B  17   4  33  79
 10    Logan  B   9  89  35  65
 11   Archie  C  83  89  59  68
 12     Theo  C  51  86  87  27
 13   Thomas  B  80  48  56  41
 14    James  E  48  77  52  11
 15   Joshua  A  63   4  80  30
 16    Henry  A  91  15  75  17
 17  William  C  80  68   3  26
 18      Max  E  97  75  41   3
 19    Lucas  A  60  41  77  62
 20    Ethan  D  79  45  89  88}
df6 = pd.read_excel(url2,header = None)
df6.head()
012345
0nameteamQ1Q2Q3Q4
1LiverE89212464
2ArryC36373757
3AckA57601884
4EorgeC93967178
df7 = pd.read_excel(url2,header = None,sheet_name = 'Sheet2')
df7.head()
012345
0LiverE89212464
1ArryC36373757
2AckA57601884
3EorgeC93967178
4OahD65496186
df8 = pd.read_excel(url2,header = [0,1])
df8
nameteamQ1Q2Q3Q4
LiverE89212464
0ArryC36373757
1AckA57601884
2EorgeC93967178
3OahD65496186
4HarlieC24138743
.....................
94GabrielC48598774
95Austin7C21313043
96Lincoln4C9893120
97EliE11745891
98BenE21434174

99 rows × 6 columns

数据输出

输出Excel 文件path_to_excel.xlsx,sheet_name 是tmp ,index = False 不设置索引,header = None 是不设置表头

df18.to_excel(‘path_to_excel.xlsx’,sheet_name = ‘tmp’,index = False,header = None)

将多个df分不同sheet导入一个Excel文件中

with pd.ExcelWriter(‘path_to_file.xlsx’) as writer:
df1.to_excel(writer, sheet_name=‘Sheet1’)
df2.to_excel(writer, sheet_name=‘Sheet2’)

最后

以上就是儒雅手机为你收集整理的pandas 读取文件的全部内容,希望文章能够帮你解决pandas 读取文件所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(43)

评论列表共有 0 条评论

立即
投稿
返回
顶部