0

Today I've been working with five DataFrames that are almost the same, but for different courses. They are named df2b2015, df4b2015, df6b2015, df2m2015.

Every one of those DataFrames has a column named prom_lect2b_rbd for df2b2015, prom_lect4b_rbd for df4b2015, and so on.

I want to append those DataFrames, but because every column has a different name, they don't go together. I'm trying to turn every one of those columns into a prom_lect_rbd column, so I can then append them without problem.

Is there a way I can do that with a for loop and regex. Else, is there a way I can do it with other means?

Thanks!

PS: I know some things, like I can turn the columns into what I want using:

re.sub('\d(b|m)','', a)

Where a is the columns name. But I can't find a way to mix that with loops and column renaming.

Edit:

DataFrame(s) look like this:

df2b2015:

rbd   prom_lect2b_rbd
 1          5
 2          6

df4b2015:

rbd   prom_lect4b_rbd
 1          8
 2          9

etc.

2
  • Good news is, there is a way! However, you're far more likely to get someone to help you if you provide sample data and expected results. Read minimal reproducible example. That said, all you need is to show 2 dataframes with 2 rows each and the appropriate column names. State that it needs to be a general solution to accommodate more than 2 dataframes. Commented Sep 12, 2018 at 21:25
  • Sorry, that was my bad. I'll Edit it if someone has a better answer than mine ! Commented Sep 12, 2018 at 21:31

2 Answers 2

1

Managed to do it. Probably not the most Pythonic way, but it does what I wanted:

dfs=[df2b2015,df4b2015,df6b2015,df8b2015,df2m2015]
cols_lect=['prom_lect2b_rbd','prom_lect4b_rbd','prom_lect6b_rbd',
           'prom_lect8b_rbd','prom_lect2m_rbd']

for j,k in zip(dfs,cols_lect):
    j.rename(columns={k:re.sub('\d(b|m)','', k)}, inplace=True)
Sign up to request clarification or add additional context in comments.

Comments

0

Something like this, with .filter(regex=)? It does assume there is only one matching column per dataframe but your example permits that.

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.rand(10,3),columns=['prom_lect2b_rbd','foo','bar'])
df2 = pd.DataFrame(np.random.rand(10,3),columns=['prom_lect4b_rbd','foo','bar'])

for df in [df1,df2]:
    colname = df.filter(regex='prom_lect*').columns.format()
    df.rename(columns={colname[0]:'prom_lect_rbd'})

print(df1)
print(df2)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.