概述
I have the following csv files:
# simulate a csv file
from StringIO import StringIO
data = StringIO("""
2012-04-01,00:10, A, 10
2012-04-01,00:20, B, 11
2012-04-01,00:30, B, 12
2012-04-02,00:10, A, 18
2012-05-02,00:20, A, 14
2012-05-02,00:30, B, 11
2012-05-03,00:10, A, 10
2012-06-03,00:20, B, 13
2012-06-03,00:30, C, 12
""".strip())
which I would like to gropu by year+month plus category (ie. A, B, C).
I would like the final data to have grouping by month and then by category
as a view of the original data
2012-04, A
>> array[0,] => 2012-04-01,00:10, A, 10
>> array[3,] => 2012-04-02,00:10, A, 18
2012-04, B
>> array[1,] => 2012-04-01,00:20, B, 11
>> array[2,] => 2012-04-01,00:30, B, 12
2012-05, A
>> array[4,] => 2012-05-02,00:20, A, 14
...
And then for each group, I would like iterate to plot them using the same function.
I have seen a similar question on splitting by dates by days
Split list of datetimes into days
and I am able to to so in my case a). But having some issues turning that into a year+month split in case b).
Here is the snippet that I have so far with the issue that I am running into:
#! /usr/bin/python
import numpy as np
import csv
import os
from datetime import datetime
def strToDate(string):
d = datetime.strptime(string, '%Y-%m-%d')
return d;
def strToMonthDate(string):
d = datetime.strptime(string, '%Y-%m-%d')
d_by_month = datetime(d.year,d.month,1)
return d_by_month;
# simulate a csv file
from StringIO import StringIO
data = StringIO("""
2012-04-01,00:10, A, 10
2012-04-01,00:20, B, 11
2012-04-01,00:30, B, 12
2012-04-02,00:10, A, 18
2012-05-02,00:20, A, 14
2012-05-02,00:30, B, 11
2012-05-03,00:10, A, 10
2012-06-03,00:20, B, 13
2012-06-03,00:30, C, 12
""".strip())
arr = np.genfromtxt(data, delimiter=',', dtype=object)
# a) If we were to just group by dates
# Get unique dates
#keys = np.unique(arr[:,0])
#keys1 = np.unique(arr[:,2])
# Group by unique dates
#for key in keys:
# print key
# for key1 in keys1:
# group = arr[ (arr[:,0]==key) & (arr[:,2]==key1) ]
# if group.size:
# print "t" + key1
# print group
# print "n"
# b) But if we want to group by year+month in the dates
dates_by_month = np.array(map(strToMonthDate, arr[:,0]))
keys2 = np.unique(dates_by_month)
print dates_by_month
# >> [datetime.datetime(2012, 4, 1, 0, 0), datetime.datetime(2012, 4, 1, 0, 0), ...
print "n"
print keys2
# >> [2012-04-01 00:00:00 2012-05-01 00:00:00 2012-06-01 00:00:00]
for key in keys2:
print key
print type(key)
group = arr[dates_by_month==key]
print group
print "n"
Question: I get the monthly key but for the group, all I get is [2012-04-01 00:10 A 10] for each group. key in keys2 is of type datetime.datetime. Any idea what could be wrong? Any alternative implementations suggestions are welcome. I would prefer not to use a itertools.groupby solution, as it returns an iterator rather than an array, which is less suitable for plotting.
Edit1: Problem solved. The issue was that the dates_by_month that I used in advance indexing in case b) should be initialized as an np.array instead of a list which map returns dates_by_month = np.array(map(strToMonthDate, arr[:,0])). I have fixed it in the snippet above, and the example now works.
解决方案
I found where the issue was in my original solution.
In case b), the
dates_by_month = map(strToMonthDate, arr[:,0])
returns a list instead of a numpy array. The advance indexing:
group = arr[dates_by_month==key]
therefore would not work. If instead, I have:
dates_by_month = np.array(map(strToMonthDate, arr[:,0]))
then the grouping works as expected.
最后
以上就是懦弱水杯为你收集整理的python按时间分类数据,Python会按年份+月份分割数据表列表的全部内容,希望文章能够帮你解决python按时间分类数据,Python会按年份+月份分割数据表列表所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复