概述
我一直在尝试在Python中实现以下SAS代码,尽管我在将代码转换为将表转换为所需代码的过程中没有遇到任何问题,但我一直通过每次更改参数来手动执行该函数。
SAS代码:
%LET SOURCEDATA =
PRACTICE.RURAL_SHOPS;
%MACRO DATAPROCESSOR(YEAR =, QUARTER =, COL1 =, COL2 =, KEYCOL =);
PROC SQL;
CREATE TABLE RURALSHOP&YEAR&QUARTER AS
SELECT
&KEYCOL,
SUM(SHOPNO) AS SHOPNO&YEAR&QUARTER,
SUM(OPSHOP) AS OPSHOP&YEAR&QUARTER,
SUM(CLSHOP) AS CLSHOP&YEAR&QUARTER
FROM
&SOURCEDATA
WHERE
&COL1 = &YEAR AND &COL2 = &QUARTER
GROUP BY
&KEYCOL;
QUIT;
%MEND DATAPROCESSOR;
%MACRO REPEAT;
%DO I = 2014% TO 2019;
%DO J = 1% TO 4;
%DATAPROCESSOR(YEAR = &I, QUARTER = &J, COL1 = YEAR, COL2 = QUARTER, KEYCOL = AREACODE);
%END;
%END;
%MEND;
%REPEAT;
有没有一种方法可以实现使用Python中的循环自动执行功能的部分?这是我目前在Python中得到的东西:
sample_data = pd.DataFrame({"YEAR" : [2014, 2014, 2015, 2015, 2016, 2016, 2017, 2017, 2018, 2018, 2019, 2019],
"QUARTER" : [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4],
"AREACODE" : [100001, 100001, 100002, 100002, 100003, 100003, 100004, 100004, 100005, 100005, 100006, 100006],
"SHOPNO" : [10, 12, 31, 5, 6, 9, 1, 3, 4, 0, 10, 2],
"OPSHOP" : [1, 1, 3, 6, 2, 1, 0, 1, 0, 0, 2, 1],
"CLSHOP" : [0, 0, 5, 1, 2, 0, 2, 1, 3, 5, 6, 0]})
def createmaster(df, keycol, keycol1, keycol2, year, quarter):
df = df[(df["{}".format(keycol1)] == year) & (df["{}".format(keycol2)] == quarter)].groupby("{}".format(keycol)).sum()
df = df.rename(columns = {"SHOPNO" : "SHOPNO{}{}".format(year, quarter),
"OPSHOP" : "OPSHOPNO{}{}".format(year, quarter),
"CLSHOP" : "CLSHOP{}{}".format(year, quarter)})
df = df.drop(columns = ["YEAR", "QUARTER"])
return df
解决方案
在SAS中,您应该PROC MEANS使用一条CLASS语句来执行此处的操作,该语句几乎不需要任何代码即可生成您所要求的内容。
通过一次遍历数据,然后两次遍历汇总数据来完成此操作,而没有什么复杂的过程:仅是摘要,快速遍历以固定输出的名称,然后进行转置。我在这里编写SAS只是为了说明Python也应该如何。这使用sashelp.prdsale与您的数据集非常相似的结构化数据集。
proc means data=sashelp.prdsale nway;
class country region year quarter;
var actual predict;
output out=prd_means sum(actual)= sum(predict)=;
run;
data prd_names;
set prd_means;
name = catx('_','actual',year,quarter);
value = actual;
output;
name = catx('_','predict',year,quarter);
value = predict;
output;
keep name value country region;
run;
proc transpose data=prd_names out=prd_vars;
by country region;
var value;
id name;
run;
因此,在Python中,我们以相同的方式进行操作-获取汇总摘要,然后根据需要调整其形状。我将用来saspy获取prdsale数据集;您可以在样本数据集上使用相同的代码。
import pandas as pd
#Everything from here to ...
import saspy
sas = saspy.SASsession(cfgname='winiomIWA')
prdsale = sas.sasdata2dataframe(table='prdsale',libref='sashelp')
#...here is just getting the dataframe set up - use your own dataframe here.
sums = prdsale.groupby(['COUNTRY','REGION','YEAR','QUARTER']).agg ('sum').unstack(['YEAR','QUARTER'])
print(sums)
实际上,您所需要做的就是最后一行(和导入)旁边的.groupby([... var list to summarize by, inc. year and quarter at the end]).agg('sum').unstack(['YEAR','QUARTER'])。
Here, groupby tells it how to group the data when calculating the sums, agg says to sum the variables, and unstack says to reshape it afterwards for those two variables (meaning turn rows into columns for those variables).
I also suspect pivot_table() would do just as well...
pv = prdsale.pivot_table(index=['COUNTRY','REGION'],columns=['YEAR','QUARTER'],
values=['ACTUAL','PREDICT'],aggfunc='sum')
Both return basically the same thing, I don't know which is faster if that matters here. See this question/answer for another example of this (though there's not really any aggregation going on there).
最后
以上就是超帅悟空为你收集整理的python内建方法会自动调用吗,有没有一种方法可以使用循环在Python中自动执行用户功能?...的全部内容,希望文章能够帮你解决python内建方法会自动调用吗,有没有一种方法可以使用循环在Python中自动执行用户功能?...所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复