2

I have main raw data of securities, out of which I need to create multiple portfolios of securities based on certain filtering criterias. I am used to working in C++, not very clear on how the below can be implemented in python.

I tried to make different dataframes using nested for-loops:

i - used to loop through years from 2007 to 2017 (column yr in raw data)

j - used to loop through regions from 1 to 4 (column Region in raw data)

for i in range (2007, 2018):
    for j in range (1,5):
         dfij_filter = (df['yr'] == i) & (df['Region'] == j)
         dfij = dfij[dfij_filter]
         dfij = dfij.join(dfco.groupby('ISSUER_NAME')['E_SCORE'].mean(), on = 'ISSUER_NAME', rsuffix = '_ry')
         dfij = dfij.join(dfco.groupby('ISSUER_NAME')['P_SCORE'].mean(), on = 'ISSUER_NAME', rsuffix = '_ry')
         dfij = dfij.join(dfco.groupby('ISSUER_NAME')['Q_SCORE'].mean(), on = 'ISSUER_NAME', rsuffix = '_ry')
         dfij = dfij.drop_duplicates(subset['ISSUER_NAME'], keep=False)
         dfij_E = dfij.sort_values('E_SCORE_ry', ascending = False)
         dfij_ETOP = dfij_E.iloc[:50, :]
         dfij_P = dfij.sort_values('P_SCORE_ry', ascending = False)
         dfij_PTOP = dfij_P.iloc[:50, :]
         dfij_Q = dfij.sort_values('E_SCORE_ry', ascending = False)
         dfij_QTOP = dfij_Q.iloc[:50, :]

I need to create different dataframes and then apply few functions on those dataframes: Essentially the flow is: Step 1: Yr filter --> Step 2: Region filter --> Step 3: Calc an average E score value, avg P score value, avg Q score value for that yr and region --> (E, P, Q are different columns) Step 4: Arrange the securities in descending order of average E score --> Step 5: Pick top 50 securities and put them in a dataframe

Repeat Step 4 and 5 for P and Q scores as well.

Essentially creating 10*4*3 dataframes.

These dataframes can then be used for backtesting purposes

Any help would be greatly appreciated. Thanks

2
  • What doesn't work / needs to be fixed? Commented Jul 26, 2018 at 19:57
  • Thanks for your response Joerg. I do not know how to create dataframes within nested for loops. In the code above I cannot dynamically create dataframes with names such as dfij (--> df20071 for eg). I know this can be done in dictionary but I'm not sure how that works here. Commented Jul 26, 2018 at 20:21

2 Answers 2

1

You can use a dictionary to store your dataframes. This has the added benefits of enabling O(1) lookup and grouping your related data. You need not use a nested loop for this, you can use dict + groupby with an input dataframe df:

dfs = dict(tuple(df.groupby(['yr', 'region']))

This creates a dictionary dfs mapping each combination of "yr" and "region" to a dataframe. You can access the dataframe for year 2010 and region 1 via d[(2010, 1)].

Now to modify your dataframes, you can simply iterate your dictionary as you would any other dictionary:

ETOP, PTOP, QTOP = {}, {}, {}

for key in dfs:
    dfs[key] = dfs[key].join(dfco.groupby('ISSUER_NAME')['E_SCORE'].mean(), ...)
    ...
    dfs[key]= dfs[key].drop_duplicates(subset=['ISSUER_NAME'], keep=False)
    ...
    E = dfs[key].sort_values('E_SCORE_ry', ascending = False)
    ETOP[key] = E.head(50)
    ...

Notice I have created dictionaries ETOP, PTOP, QTOP to store result dataframes, each indexed by the same ('yr', 'region') key structure. This way, you can easily access, modify or combine results for any particular combination.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks jpp. In line: dfs[key]= dfs[key].drop_duplicates(subset=['ISSUER_NAME'], keep=False) . How can I drop duplicates irrespective of case sensitivity?
You can use dfs[key] = dfs[key][~dfs[key]['ISSUER_NAME'].str.lower().duplicated()]. See pd.Series.duplicated for more details.
0
for k, v in df.groupby(['yr', 'region']):
    print(v)

2 Comments

Thanks for your help ZJS. I need to create different dataframes and then apply few functions on those dataframes: Essentially the flow is: Yr filter --> Region filter --> Calc an average E score value for that yr and region --> arrange the securities in descending order of average E score --> Pick top 50 securities and put them in a dataframe These dataframes can then be used for backtesting purposes.
Please add an explanation of your code, it will greatly increase the quality of the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.