python按时间分类数据,Python会按年份+月份分割数据表列表

84 阅读 0 评论 56 点赞

我是靠谱客的博主懦弱水杯，最近开发中收集的这篇文章主要介绍python按时间分类数据,Python会按年份+月份分割数据表列表，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

I have the following csv files:

# simulate a csv file

from StringIO import StringIO

data = StringIO("""

2012-04-01,00:10, A, 10

2012-04-01,00:20, B, 11

2012-04-01,00:30, B, 12

2012-04-02,00:10, A, 18

2012-05-02,00:20, A, 14

2012-05-02,00:30, B, 11

2012-05-03,00:10, A, 10

2012-06-03,00:20, B, 13

2012-06-03,00:30, C, 12

""".strip())

which I would like to gropu by year+month plus category (ie. A, B, C).

I would like the final data to have grouping by month and then by category

as a view of the original data

2012-04, A

>> array[0,] => 2012-04-01,00:10, A, 10

>> array[3,] => 2012-04-02,00:10, A, 18

2012-04, B

>> array[1,] => 2012-04-01,00:20, B, 11

>> array[2,] => 2012-04-01,00:30, B, 12

2012-05, A

>> array[4,] => 2012-05-02,00:20, A, 14

...

And then for each group, I would like iterate to plot them using the same function.

I have seen a similar question on splitting by dates by days

Split list of datetimes into days

and I am able to to so in my case a). But having some issues turning that into a year+month split in case b).

Here is the snippet that I have so far with the issue that I am running into:

#! /usr/bin/python

import numpy as np

import csv

import os

from datetime import datetime

def strToDate(string):

d = datetime.strptime(string, '%Y-%m-%d')

return d;

def strToMonthDate(string):

d = datetime.strptime(string, '%Y-%m-%d')

d_by_month = datetime(d.year,d.month,1)

return d_by_month;

# simulate a csv file

from StringIO import StringIO

data = StringIO("""

2012-04-01,00:10, A, 10

2012-04-01,00:20, B, 11

2012-04-01,00:30, B, 12

2012-04-02,00:10, A, 18

2012-05-02,00:20, A, 14

2012-05-02,00:30, B, 11

2012-05-03,00:10, A, 10

2012-06-03,00:20, B, 13

2012-06-03,00:30, C, 12

""".strip())

arr = np.genfromtxt(data, delimiter=',', dtype=object)

# a) If we were to just group by dates

# Get unique dates

#keys = np.unique(arr[:,0])

#keys1 = np.unique(arr[:,2])

# Group by unique dates

#for key in keys:

# print key

# for key1 in keys1:

# group = arr[ (arr[:,0]==key) & (arr[:,2]==key1) ]

# if group.size:

# print "t" + key1

# print group

# print "n"

# b) But if we want to group by year+month in the dates

dates_by_month = np.array(map(strToMonthDate, arr[:,0]))

keys2 = np.unique(dates_by_month)

print dates_by_month

# >> [datetime.datetime(2012, 4, 1, 0, 0), datetime.datetime(2012, 4, 1, 0, 0), ...

print "n"

print keys2

# >> [2012-04-01 00:00:00 2012-05-01 00:00:00 2012-06-01 00:00:00]

for key in keys2:

print key

print type(key)

group = arr[dates_by_month==key]

print group

print "n"

Question: I get the monthly key but for the group, all I get is [2012-04-01 00:10 A 10] for each group. key in keys2 is of type datetime.datetime. Any idea what could be wrong? Any alternative implementations suggestions are welcome. I would prefer not to use a itertools.groupby solution, as it returns an iterator rather than an array, which is less suitable for plotting.

Edit1: Problem solved. The issue was that the dates_by_month that I used in advance indexing in case b) should be initialized as an np.array instead of a list which map returns dates_by_month = np.array(map(strToMonthDate, arr[:,0])). I have fixed it in the snippet above, and the example now works.

解决方案

I found where the issue was in my original solution.

In case b), the

dates_by_month = map(strToMonthDate, arr[:,0])

returns a list instead of a numpy array. The advance indexing:

group = arr[dates_by_month==key]

therefore would not work. If instead, I have:

dates_by_month = np.array(map(strToMonthDate, arr[:,0]))

then the grouping works as expected.