# 各季度的汇总数据[英] aggregate data by quarter

### 问题描述

```df = pd.DataFrame(
{'store':['A','B','C','D','E']*7,
'region':['NW','NW','SW','NE','NE']*7,
'date':['2017-03-30']*5+['2017-04-05']*5+['2017-04-07']*5+['2017-04-12']*5+['2017-04-13']*5+['2017-04-17']*5+['2017-04-20']*5,
'sales':[30,1,133,9,1,30,3,135,9,11,30,1,140,15,15,25,10,137,9,3,29,10,137,9,11,30,19,145,20,10,30,8,141,25,25]
})
df['date'] = pd.to_datetime(df['date'])
df_sales = df.pivot_table(index = ['region'], columns = ['date'], aggfunc = [np.sum], margins = True)
df_sales = df_sales.ix[:,range(0, df_sales.shape-1)]
```

```2017Q1  2017Q2
10      27
31      37.5
133     139.17
```

```(20+30+12+20+30+50)/6=27
```

## 推荐答案

```df.date = pd.to_datetime(df.date)

df_sales = df.pivot_table(index='region', columns='date', values='sales', aggfunc='sum')

In : df_sales
Out:
date    2017-03-30  2017-04-05  2017-04-07  2017-04-12  2017-04-13  2017-04-17  2017-04-20
region
NE              10          20          30          12          20          30          50
NW              31          33          31          35          39          49          38
SW             133         135         140         137         137         145         141
```

```In : (df_sales.groupby(pd.PeriodIndex(df_sales.columns, freq='Q'), axis=1)
...:          .apply(lambda x: x.sum(axis=1)/x.shape)
...: )
Out:
date    2017Q1      2017Q2
region
NE        10.0   27.000000
NW        31.0   37.500000
SW       133.0  139.166667
```

```In : (df.groupby(['region', pd.PeriodIndex(df.date, freq='Q-DEC')])
...:    .apply(lambda x: x['sales'].sum()/x['date'].nunique())
...:    .to_frame('avg').unstack('date')
...: )
...:
Out:
avg
date   2017Q1      2017Q2
region
NE       10.0   27.000000
NW       31.0   37.500000
SW      133.0  139.166667
```

### 问题描述

I have a pivot pandas data frame (sales by region) that got created from another pandas data frame (sales by store) using the pivot_table method.

As an example:

```df = pd.DataFrame(
{'store':['A','B','C','D','E']*7,
'region':['NW','NW','SW','NE','NE']*7,
'date':['2017-03-30']*5+['2017-04-05']*5+['2017-04-07']*5+['2017-04-12']*5+['2017-04-13']*5+['2017-04-17']*5+['2017-04-20']*5,
'sales':[30,1,133,9,1,30,3,135,9,11,30,1,140,15,15,25,10,137,9,3,29,10,137,9,11,30,19,145,20,10,30,8,141,25,25]
})
df['date'] = pd.to_datetime(df['date'])
df_sales = df.pivot_table(index = ['region'], columns = ['date'], aggfunc = [np.sum], margins = True)
df_sales = df_sales.ix[:,range(0, df_sales.shape-1)]
```

My goal is to do the following to the sales data frame, df_sales.

Create a new dataframe that summarizes sales by quarter. I could use the original dataframe df, or the sales_df.

As of quarter here we only have only two quarters (USA fiscal calendar year) so the quarterly aggregated data frame would look like:

```2017Q1  2017Q2
10      27
31      37.5
133     139.17
```

I take the average for all days in Q1, and same for Q2. Thus, for example for the North east region, 'NE', the Q1 is the average of only one day 2017-03-30, i.e., 10, and for the Q2 is the average across 2017-04-05 to 2017-04-20, i.e.,

```(20+30+12+20+30+50)/6=27
```

Any suggestions?

ADDITIONAL NOTE: I would ideally do the quarter aggregations on the df_sales pivoted table since it's a much smaller dataframe to keep in memory. The current solution does it on the original df, but I am still seeking a way to do it in the df_sales dataframe.

## 推荐答案

UPDATE:

Setup:

```df.date = pd.to_datetime(df.date)

df_sales = df.pivot_table(index='region', columns='date', values='sales', aggfunc='sum')

In : df_sales
Out:
date    2017-03-30  2017-04-05  2017-04-07  2017-04-12  2017-04-13  2017-04-17  2017-04-20
region
NE              10          20          30          12          20          30          50
NW              31          33          31          35          39          49          38
SW             133         135         140         137         137         145         141
```

Solution:

```In : (df_sales.groupby(pd.PeriodIndex(df_sales.columns, freq='Q'), axis=1)
...:          .apply(lambda x: x.sum(axis=1)/x.shape)
...: )
Out:
date    2017Q1      2017Q2
region
NE        10.0   27.000000
NW        31.0   37.500000
SW       133.0  139.166667
```

Solution based on the original DF:

```In : (df.groupby(['region', pd.PeriodIndex(df.date, freq='Q-DEC')])
...:    .apply(lambda x: x['sales'].sum()/x['date'].nunique())
...:    .to_frame('avg').unstack('date')
...: )
...:
Out:
avg
date   2017Q1      2017Q2
region
NE       10.0   27.000000
NW       31.0   37.500000
SW      133.0  139.166667
```

NOTE: df - is the original DF (before "pivoting")