1

I have a multi-index pandas dataframe (groupbydataframe) as below:

Store  Year  Month
1      2013  1         4922.0
             2         5315.5
             3         5603.0
             4         4625.0
             5         4995.0
             6         4407.0
             7         4421.0
             8         4494.0
             9         4212.0
             10        4148.5
             11        5033.5
             12        6111.0
       2014  1         4547.5
             2         4726.5
             3         4455.0
             4         4577.5
             5         4909.0
             6         4788.0
             7         4507.0
             8         4124.0
             9         3970.5
             10        4431.0
             11        5220.0
             12        6466.0
       2015  1         4690.0
             2         4467.5
             3         4294.5
             4         4256.0
             5         4529.0
             6         4102.0
                       ...   
1112   2013  2        10991.0
             3        12322.0
             4        10705.0
             5        12096.0
             6        11029.0
             7        10419.0
             8         8994.0
             9         9448.0
             10        9019.0
             11        9294.0
             12       11163.0
       2014  1         8156.5
             2         8862.5
             3         8606.5
             4         9662.0
             5        10375.0
             6         8755.0
             7         8262.0
             8         8454.0
             9         7621.5
             10        8779.0
             11        9638.0
             12        9592.0
       2015  1         9345.0
             2         8466.5
             3         8457.0
             4         8715.0
             5         9125.0
             6         8528.0
             7         7728.0

The last column is "Sales".

  1. If FindYear=2014 and FindMonth=01 how do I extract sales for 2013-12 to 2014-02 for Store=1?
  2. if FindYear and FindMonth are lists with length(FindYear)=length(Store). How do do (1) without looping through all stores individually?
  3. Or am I looking at this the wrong way? do I need to unstack() and go from there?

1 Answer 1

1

You can first convert second and third level of Multiindex to_period:

s = pd.Series({(1, 2013, 2): 5315.5, (1, 2014, 8): 4124.0, (1, 2014, 9): 3970.5, (1112, 2013, 8): 8994.0, (1, 2013, 12): 6111.0, (1, 2015, 3): 4294.5, (1112, 2014, 10): 8779.0, (1, 2014, 10): 4431.0, (1112, 2013, 9): 9448.0, (1112, 2014, 12): 9592.0, (1, 2015, 2): 4467.5, (1, 2014, 11): 5220.0, (1112, 2013, 10): 9019.0, (1, 2015, 1): 4690.0, (1112, 2013, 11): 9294.0, (1112, 2015, 2): 8466.5, (1, 2013, 9): 4212.0, (1112, 2015, 1): 9345.0, (1112, 2013, 4): 10705.0, (1, 2013, 8): 4494.0, (1112, 2014, 9): 7621.5, (1112, 2013, 5): 12096.0, (1, 2014, 4): 4577.5, (1, 2013, 11): 5033.5, (1, 2015, 6): 4102.0, (1112, 2013, 6): 11029.0, (1, 2014, 5): 4909.0, (1, 2013, 10): 4148.5, (1, 2015, 5): 4529.0, (1112, 2014, 8): 8454.0, (1112, 2013, 7): 10419.0, (1, 2014, 6): 4788.0, (1, 2015, 4): 4256.0, (1, 2014, 7): 4507.0, (1112, 2014, 5): 10375.0, (1112, 2015, 3): 8457.0, (1112, 2014, 4): 9662.0, (1, 2013, 5): 4995.0, (1112, 2013, 2): 10991.0, (1, 2014, 1): 4547.5, (1112, 2014, 7): 8262.0, (1, 2013, 4): 4625.0, (1112, 2013, 3): 12322.0, (1, 2014, 2): 4726.5, (1112, 2014, 6): 8755.0, (1, 2013, 7): 4421.0, (1, 2014, 3): 4455.0, (1112, 2014, 1): 8156.5, (1, 2013, 6): 4407.0, (1112, 2014, 11): 9638.0, (1, 2014, 12): 6466.0, (1, 2013, 1): 4922.0, (1112, 2015, 7): 7728.0, (1112, 2013, 12): 11163.0, (1112, 2014, 3): 8606.5, (1112, 2015, 4): 8715.0, (1112, 2014, 2): 8862.5, (1, 2013, 3): 5603.0, (1112, 2015, 6): 8528.0, (1112, 2015, 5): 9125.0})
per = pd.DatetimeIndex(s.index.map(lambda x: pd.to_datetime(str(x[1]) 
                                                          + str(x[2]), format='%Y%m')))
        .to_period('m')

print (per)
PeriodIndex(['2013-01', '2013-02', '2013-03', '2013-04', '2013-05', '2013-06',
             '2013-07', '2013-08', '2013-09', '2013-10', '2013-11', '2013-12',
             '2014-01', '2014-02', '2014-03', '2014-04', '2014-05', '2014-06',
             '2014-07', '2014-08', '2014-09', '2014-10', '2014-11', '2014-12',
             '2015-01', '2015-02', '2015-03', '2015-04', '2015-05', '2015-06',
             '2013-02', '2013-03', '2013-04', '2013-05', '2013-06', '2013-07',
             '2013-08', '2013-09', '2013-10', '2013-11', '2013-12', '2014-01',
             '2014-02', '2014-03', '2014-04', '2014-05', '2014-06', '2014-07',
             '2014-08', '2014-09', '2014-10', '2014-11', '2014-12', '2015-01',
             '2015-02', '2015-03', '2015-04', '2015-05', '2015-06', '2015-07'],
            dtype='int64', freq='M')

Then assign new Multiindex, where second level is PeriodIndex:

new_index = list(zip(df.index.get_level_values('Store'), per))
s.index = pd.MultiIndex.from_tuples(new_index, names = ('Store','Period'))
print (s)
Store  Period 
1      2013-01     4922.0
       2013-02     5315.5
       2013-03     5603.0
       2013-04     4625.0
       2013-05     4995.0
       2013-06     4407.0
       2013-07     4421.0
       2013-08     4494.0
       2013-09     4212.0
       2013-10     4148.5
       2013-11     5033.5
       2013-12     6111.0
       2014-01     4547.5
       2014-02     4726.5
       2014-03     4455.0
...
...
...

Selecting with loc:

print (s.loc[1,pd.Period('2014-01')])
4547.5

print (s.loc[1,pd.Period('2013-12'):pd.Period('2014-02')])
Store  Period 
1      2013-12    6111.0
       2014-01    4547.5
       2014-02    4726.5
Name: Sales, dtype: float64

per1 = pd.Period('2014-01')
per2 = pd.Period('2015-01')
per = [per1, per2]

print (s.loc[1, per1 - 1: per1 + 1])
Store  Period 
1      2013-12    6111.0
       2014-01    4547.5
       2014-02    4726.5
Name: Sales, dtype: float64

Multiple selecting by list comprehension:

print ([s.loc[1, x - 1: x + 1] for x in per])
[Store  Period 
1      2013-12    6111.0
       2014-01    4547.5
       2014-02    4726.5
Name: Sales, dtype: float64, Store  Period 
1      2014-12    6466.0
       2015-01    4690.0
       2015-02    4467.5
Name: Sales, dtype: float64]

which can be concated:

print (pd.concat([s.loc[1, x - 1: x + 1] for x in per]))
Store  Period 
1      2013-12    6111.0
       2014-01    4547.5
       2014-02    4726.5
       2014-12    6466.0
       2015-01    4690.0
       2015-02    4467.5
Name: Sales, dtype: float64
Sign up to request clarification or add additional context in comments.

13 Comments

Thanks @jezrael. I did not think outside the box!! Converting "year" and "month" to "year-month" datetime index is the most straightforward way!
consider a multi-index dataframe with Store and YearMonth (e.g: 2014-01) as levels[0] and levels[1] respectively. Is there a way to perform in-place conversion of YearMonth to DatetimeIndex with freq='M'
I think I have figured out how to do it inplace. s.index.set_levels(per, level='YearMonth', inplace=True) but haven't tested it properly. However this still uses per which is separate sequence to force levels to new.
There is another possible solution - convert MultiIndex to columns, then create new period column and delete unnecessary columns. Then set index with period column. But it seems for me less effective.
when i try print (s.loc[1, per1 - 1: per1 + 1]), I get invalid syntax on colon (:) operator. I am using anaconda 3.5 python ? If I print(per1-1) and print(per1+1) it looks fine, but somehow it complains when I try with : operator
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.