1

I have a data frame with 2 indexes called "DATE"( it is monthly data) and "ID" and a column variable named Volume. Now I want to iterate over it and fill for every unique ID a new column with the average value of the column Volume in a new column.

The basic idea is to figure out which months are above the yearly avg for every ID.

list(df.index)

(Timestamp('1970-09-30 00:00:00'), 12167.0)

print(df.index.name)

None

I seemed to not find a tutorial to address this :(

Can someone please point me in the right direction

                    SHRCD  EXCHCD   SICCD     PRC     VOL       RET    SHROUT  \
DATE       PERMNO                                                               
1970-08-31 10559.0   10.0     1.0  5311.0  35.000  1692.0  0.030657   12048.0   
           12626.0   10.0     1.0  5411.0  46.250   926.0  0.088235    6624.0   
           12749.0   11.0     1.0  5331.0  45.500  5632.0  0.126173   34685.0   
           13100.0   11.0     1.0  5311.0  22.000  1759.0  0.171242   15107.0   
           13653.0   10.0     1.0  5311.0  13.125   141.0  0.220930    1337.0   
           13936.0   11.0     1.0  2331.0  11.500   270.0 -0.053061    3942.0   
           14322.0   11.0     1.0  5311.0  64.750  6934.0  0.024409  154187.0   
           16969.0   10.0     1.0  5311.0  42.875  1069.0  0.186851   13828.0   
           17072.0   10.0     1.0  5311.0  14.750   777.0  0.026087    5415.0   
           17304.0   10.0     1.0  5311.0  24.875  1939.0  0.058511    8150.0 
6
  • Thank you so much, the problem is that I have not only groupby ID but also by the year of the 'DATE' index. meaning I have to somehow get the year out of it :( Commented Nov 11, 2018 at 5:23
  • Is possible create some sample data with expected output? Commented Nov 11, 2018 at 5:27
  • I hope i did that, i just want to for example for each PERMNO do the yearly avg of volume, so i need to access the DATE index, but I do not know how. Commented Nov 11, 2018 at 5:43
  • Do you think df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['Volume'].transform('mean') ? Commented Nov 11, 2018 at 5:44
  • It does not throw an error so I hope it worked. I am just puzzled how you came up with index.get_level_values(0).year ..Can you tell me how you found that, so I can help myself in the future? Commented Nov 11, 2018 at 5:54

1 Answer 1

1

You can use transform with year for same size Series like original DataFrame:

print (df)
                    VOL
DATE       PERMNO      
1970-08-31 10559.0    1
           10559.0    2
           12749.0    3
1971-08-31 13100.0    4
           13100.0    5

df['avg'] = df.groupby([df.index.get_level_values(0).year, 'PERMNO'])['VOL'].transform('mean')
print (df)
                    VOL  avg
DATE       PERMNO           
1970-08-31 10559.0    1  1.5
           10559.0    2  1.5
           12749.0    3  3.0
1971-08-31 13100.0    4  4.5
           13100.0    5  4.5
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.