本文是小编为大家收集整理的关于各季度的汇总数据的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。
问题描述
我有一个使用Pivot_table方法从另一个PANDAS数据框架(按商店销售)创建的Pivot Pandas数据框架(按区域销售).
作为一个例子:
df = pd.DataFrame( {'store':['A','B','C','D','E']*7, 'region':['NW','NW','SW','NE','NE']*7, 'date':['2017-03-30']*5+['2017-04-05']*5+['2017-04-07']*5+['2017-04-12']*5+['2017-04-13']*5+['2017-04-17']*5+['2017-04-20']*5, 'sales':[30,1,133,9,1,30,3,135,9,11,30,1,140,15,15,25,10,137,9,3,29,10,137,9,11,30,19,145,20,10,30,8,141,25,25] }) df['date'] = pd.to_datetime(df['date']) df_sales = df.pivot_table(index = ['region'], columns = ['date'], aggfunc = [np.sum], margins = True) df_sales = df_sales.ix[:,range(0, df_sales.shape[1]-1)]
我的目标是对销售数据框架进行以下操作.
创建一个新的数据框架,总结季度的销售.我可以使用原始的DataFrame DF或sales_df.
截至四分之一,我们只有两个季度(美国财政日历年)季度的汇总数据框看起来像:
2017Q1 2017Q2 10 27 31 37.5 133 139.17
我在第1季度的所有天数中的平均值,而第2季度相同.因此,例如,对于东北地区,'NE',Q1是2017-03-30的平均值,即10,而对于Q2是2017-04-05至2017-04的平均值-20,即
(20+30+12+20+30+50)/6=27
有什么建议?
附加说明:理想情况下,我将在DF_Sales枢纽表上进行四分之一的聚合,因为它是一个较小的数据帧,可以保持内存.当前的解决方案可以在原始DF上进行,但是我仍在寻找在DF_Sales DataFrame中进行的方法.
推荐答案
更新:
设置:
df.date = pd.to_datetime(df.date) df_sales = df.pivot_table(index='region', columns='date', values='sales', aggfunc='sum') In [318]: df_sales Out[318]: date 2017-03-30 2017-04-05 2017-04-07 2017-04-12 2017-04-13 2017-04-17 2017-04-20 region NE 10 20 30 12 20 30 50 NW 31 33 31 35 39 49 38 SW 133 135 140 137 137 145 141
解决方案:
In [319]: (df_sales.groupby(pd.PeriodIndex(df_sales.columns, freq='Q'), axis=1) ...: .apply(lambda x: x.sum(axis=1)/x.shape[1]) ...: ) Out[319]: date 2017Q1 2017Q2 region NE 10.0 27.000000 NW 31.0 37.500000 SW 133.0 139.166667
基于原始DF的解决方案:
In [253]: (df.groupby(['region', pd.PeriodIndex(df.date, freq='Q-DEC')]) ...: .apply(lambda x: x['sales'].sum()/x['date'].nunique()) ...: .to_frame('avg').unstack('date') ...: ) ...: Out[253]: avg date 2017Q1 2017Q2 region NE 10.0 27.000000 NW 31.0 37.500000 SW 133.0 139.166667
注意:df - 是原始DF("旋转"之前)
问题描述
I have a pivot pandas data frame (sales by region) that got created from another pandas data frame (sales by store) using the pivot_table method.
As an example:
df = pd.DataFrame( {'store':['A','B','C','D','E']*7, 'region':['NW','NW','SW','NE','NE']*7, 'date':['2017-03-30']*5+['2017-04-05']*5+['2017-04-07']*5+['2017-04-12']*5+['2017-04-13']*5+['2017-04-17']*5+['2017-04-20']*5, 'sales':[30,1,133,9,1,30,3,135,9,11,30,1,140,15,15,25,10,137,9,3,29,10,137,9,11,30,19,145,20,10,30,8,141,25,25] }) df['date'] = pd.to_datetime(df['date']) df_sales = df.pivot_table(index = ['region'], columns = ['date'], aggfunc = [np.sum], margins = True) df_sales = df_sales.ix[:,range(0, df_sales.shape[1]-1)]
My goal is to do the following to the sales data frame, df_sales.
Create a new dataframe that summarizes sales by quarter. I could use the original dataframe df, or the sales_df.
As of quarter here we only have only two quarters (USA fiscal calendar year) so the quarterly aggregated data frame would look like:
2017Q1 2017Q2 10 27 31 37.5 133 139.17
I take the average for all days in Q1, and same for Q2. Thus, for example for the North east region, 'NE', the Q1 is the average of only one day 2017-03-30, i.e., 10, and for the Q2 is the average across 2017-04-05 to 2017-04-20, i.e.,
(20+30+12+20+30+50)/6=27
Any suggestions?
ADDITIONAL NOTE: I would ideally do the quarter aggregations on the df_sales pivoted table since it's a much smaller dataframe to keep in memory. The current solution does it on the original df, but I am still seeking a way to do it in the df_sales dataframe.
推荐答案
UPDATE:
Setup:
df.date = pd.to_datetime(df.date) df_sales = df.pivot_table(index='region', columns='date', values='sales', aggfunc='sum') In [318]: df_sales Out[318]: date 2017-03-30 2017-04-05 2017-04-07 2017-04-12 2017-04-13 2017-04-17 2017-04-20 region NE 10 20 30 12 20 30 50 NW 31 33 31 35 39 49 38 SW 133 135 140 137 137 145 141
Solution:
In [319]: (df_sales.groupby(pd.PeriodIndex(df_sales.columns, freq='Q'), axis=1) ...: .apply(lambda x: x.sum(axis=1)/x.shape[1]) ...: ) Out[319]: date 2017Q1 2017Q2 region NE 10.0 27.000000 NW 31.0 37.500000 SW 133.0 139.166667
Solution based on the original DF:
In [253]: (df.groupby(['region', pd.PeriodIndex(df.date, freq='Q-DEC')]) ...: .apply(lambda x: x['sales'].sum()/x['date'].nunique()) ...: .to_frame('avg').unstack('date') ...: ) ...: Out[253]: avg date 2017Q1 2017Q2 region NE 10.0 27.000000 NW 31.0 37.500000 SW 133.0 139.166667
NOTE: df - is the original DF (before "pivoting")