pandas中dataFrame基本使用DataFrame的创建DataFrame的读取

90 阅读 0 评论 60 点赞

我是靠谱客的博主端庄期待，最近开发中收集的这篇文章主要介绍pandas中dataFrame基本使用DataFrame的创建DataFrame的读取，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

DataFrame的创建

基于字典创建

import pandas as pd
dict = {'id':[1,2,3,4,5,6],'name':['Alice','Bob','Cindy','Eric','Helen','Grace ']}
print(type(dict))#显示类型为dict
dictDataframe = pd.DataFrame(dict)
print(type(dictDataframe))
print('the content of dictDataframe is :n',dictDataframe)

输出结果为

<class 'dict'>
<class 'pandas.core.frame.DataFrame'>
the content of dictDataframe is :
    id    name
0   1   Alice
1   2     Bob
2   3   Cindy
3   4    Eric
4   5   Helen
5   6  Grace

基于文件读取创建

df = pd.read_csv('D:/study/python/test.txt')
print(type(df))
print(df)

输出结果（文件中分隔符为空格）

<class 'pandas.core.frame.DataFrame'>
         id name python
0   message from python
1   message from python
2   message from python
3   message from python

DataFrame的读取

按行读取

使用loc、iloc函数
区别：iloc是按照行索引所在的位置来选取数据，参数只能是整数；loc是按照索引名称来选取数据，参数类型依索引类型而定。

默认情况下行索引是从0开始的整数，但也可以指定行索引。

情况1.默认情况

import pandas  as pd
df_idx = pd.DataFrame({'name':['lili','pingguo'],'score':['97','39']})
print(df_idx)

输出结果：

      name score
0     lili    97
1  pingguo    39

此时行索引就是0、1…，使用loc、iloc很像

print(df_idx.loc[1])

输出结果(这里的1为行索引名称)：

name     pingguo
score         39
Name: 1, dtype: object

print(df_idx.iloc[1])

输出结果（这里的1为行索引所在位置）：

name     pingguo
score         39
Name: 1, dtype: object

情况2：指定行索引，行索引指定为日期：

import pandas  as pd
df_idx = pd.DataFrame(
    data=[[   0,    0,    2],
          [1478, 3877, 3674],
          [1613, 4088, 3991]],
    index=['05-01-11', '05-02-11', '05-03-11'],
    columns=['RA', 'RB', 'RC']
)
print(df_idx)

输出结果：

            RA    RB    RC
05-01-11     0     0     2
05-02-11  1478  3877  3674
05-03-11  1613  4088  3991

这种情况下就能看出iloc与loc的差别：

df_idx.loc[1]

执行上面的代码直接报错。
使用iloc则没有问题

print(df_idx.iloc[1])

输出结果：
RA 1478
RB 3877
RC 3674
Name: 05-02-11, dtype: int64

此时只能使用行索引名称，才能正常输出：

print(df_idx.loc['05-02-11'])
print("-------华丽分隔符----------------------")
print(df_idx.loc['05-02-11':'05-03-11'])

输出结果：

RA    1478
RB    3877
RC    3674
Name: 05-02-11, dtype: int64
-------华丽分隔符----------------------
            RA    RB    RC
05-02-11  1478  3877  3674
05-03-11  1613  4088  3991