Need to create multiple dataframes in nested for loops based on certain filters

Question

I have main raw data of securities, out of which I need to create multiple portfolios of securities based on certain filtering criterias. I am used to working in C++, not very clear on how the below can be implemented in python.

I tried to make different dataframes using nested for-loops:

i - used to loop through years from 2007 to 2017 (column yr in raw data)

j - used to loop through regions from 1 to 4 (column Region in raw data)

for i in range (2007, 2018):
    for j in range (1,5):
         dfij_filter = (df['yr'] == i) & (df['Region'] == j)
         dfij = dfij[dfij_filter]
         dfij = dfij.join(dfco.groupby('ISSUER_NAME')['E_SCORE'].mean(), on = 'ISSUER_NAME', rsuffix = '_ry')
         dfij = dfij.join(dfco.groupby('ISSUER_NAME')['P_SCORE'].mean(), on = 'ISSUER_NAME', rsuffix = '_ry')
         dfij = dfij.join(dfco.groupby('ISSUER_NAME')['Q_SCORE'].mean(), on = 'ISSUER_NAME', rsuffix = '_ry')
         dfij = dfij.drop_duplicates(subset['ISSUER_NAME'], keep=False)
         dfij_E = dfij.sort_values('E_SCORE_ry', ascending = False)
         dfij_ETOP = dfij_E.iloc[:50, :]
         dfij_P = dfij.sort_values('P_SCORE_ry', ascending = False)
         dfij_PTOP = dfij_P.iloc[:50, :]
         dfij_Q = dfij.sort_values('E_SCORE_ry', ascending = False)
         dfij_QTOP = dfij_Q.iloc[:50, :]

I need to create different dataframes and then apply few functions on those dataframes: Essentially the flow is: Step 1: Yr filter --> Step 2: Region filter --> Step 3: Calc an average E score value, avg P score value, avg Q score value for that yr and region --> (E, P, Q are different columns) Step 4: Arrange the securities in descending order of average E score --> Step 5: Pick top 50 securities and put them in a dataframe

Repeat Step 4 and 5 for P and Q scores as well.

Essentially creating 10*4*3 dataframes.

These dataframes can then be used for backtesting purposes

Any help would be greatly appreciated. Thanks

Thanks for your response Joerg. I do not know how to create dataframes within nested for loops. In the code above I cannot dynamically create dataframes with names such as dfij (--> df20071 for eg). I know this can be done in dictionary but I'm not sure how that works here. — Lifelong_Learner
– Lifelong_Learner, Commented Jul 26, 2018 at 20:21

jpp · Accepted Answer · 2018-07-27 02:41:58Z

1

You can use a dictionary to store your dataframes. This has the added benefits of enabling O(1) lookup and grouping your related data. You need not use a nested loop for this, you can use dict + groupby with an input dataframe df:

dfs = dict(tuple(df.groupby(['yr', 'region']))

This creates a dictionary dfs mapping each combination of "yr" and "region" to a dataframe. You can access the dataframe for year 2010 and region 1 via d[(2010, 1)].

Now to modify your dataframes, you can simply iterate your dictionary as you would any other dictionary:

ETOP, PTOP, QTOP = {}, {}, {}

for key in dfs:
    dfs[key] = dfs[key].join(dfco.groupby('ISSUER_NAME')['E_SCORE'].mean(), ...)
    ...
    dfs[key]= dfs[key].drop_duplicates(subset=['ISSUER_NAME'], keep=False)
    ...
    E = dfs[key].sort_values('E_SCORE_ry', ascending = False)
    ETOP[key] = E.head(50)
    ...

Notice I have created dictionaries ETOP, PTOP, QTOP to store result dataframes, each indexed by the same ('yr', 'region') key structure. This way, you can easily access, modify or combine results for any particular combination.

answered Jul 27, 2018 at 2:41

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Lifelong_Learner Over a year ago

Thanks jpp. In line: dfs[key]= dfs[key].drop_duplicates(subset=['ISSUER_NAME'], keep=False) . How can I drop duplicates irrespective of case sensitivity?

jpp Over a year ago

You can use dfs[key] = dfs[key][~dfs[key]['ISSUER_NAME'].str.lower().duplicated()]. See pd.Series.duplicated for more details.

ndmeiri · Accepted Answer · 2018-07-27 02:16:36Z

0

for k, v in df.groupby(['yr', 'region']):
    print(v)

edited Jul 27, 2018 at 2:16

ndmeiri

5,03912 gold badges39 silver badges47 bronze badges

answered Jul 26, 2018 at 19:21

ZJS

4,0812 gold badges18 silver badges23 bronze badges

2 Comments

Lifelong_Learner Over a year ago

Thanks for your help ZJS. I need to create different dataframes and then apply few functions on those dataframes: Essentially the flow is: Yr filter --> Region filter --> Calc an average E score value for that yr and region --> arrange the securities in descending order of average E score --> Pick top 50 securities and put them in a dataframe These dataframes can then be used for backtesting purposes.

Nic3500 Over a year ago

Please add an explanation of your code, it will greatly increase the quality of the answer.

Collectives™ on Stack Overflow

Need to create multiple dataframes in nested for loops based on certain filters

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related