How to reorder columns in a Pandas dataframe based on other dataframe columns

Question

Suppose these dataframes:

import pandas as pd

df_one = pd.DataFrame({'col_1':[1, 2, 3, 4], 'col_2':[5,6,7,8], 'col_3':[9,10,11,12]})
df_two = pd.DataFrame({'col_1':[1, 2, 3, 4], 'col_3': [9,10,11,12], '2_col':[5, 6, 7, 8]})

In reality these dataframes come from different txt files so the concept of each column is the same but the order of columns is not, and some of the columns have a slightly different name. Both datasets have 33 columns representing the same concepts but in different order.

How can I reorder the second df with the same structure as the first df? Meaning same order of columns and same column names as df_one...

The final objective is to merge both df into a single consolidated one.

I have tried this:

cols = df_one.columns.to_list()  # get columns names from df_one
df_two = df_two.reindex(columns=cols)

but this gets NaN values in 'col_2':

col_1   col_2   col_3
0   1   NaN 9
1   2   NaN 10
2   3   NaN 11
3   4   NaN 12

I also tried to first change col names in df_two and then reorder:

df_two.columns = cols
df_two = df_two.reindex(columns=cols)

but this also is wrong (col_2 now have the values of col_3):

col_1   col_2   col_3
0   1   9   5
1   2   10  6
2   3   11  7
3   4   12  8

Thanks for your suggestions.

EDIT BASED ON COMMENTS:

Actual Column names are more like: 'Date' & 'iDate', 'Contract' & 'nContract', 'Premium' & 'iPremium'. I exemplified with numbers in the question (probably bad idea), but correlated numbers are not part of the names.

How can I map the order of columns in df_two ? (say, col 1 of df_1 is the same as col 1 in df_2, col 2 of df_1 is col_3 of df_2, col_3 of df_1 is col_2 of df_2) - And then I would rename the columns in df_2 as in df_1.

First rename columns, so that df_one and df_two have the same column names (using df_one.rename(columns={'col_one':'col_two', ...})). Then df_one[df_two.columns] will do the job. — pythonic833
– pythonic833, Commented Jun 12, 2020 at 23:18
what is the commonality between the columns, are they all numbered ? — Umar.H
– Umar.H, Commented Jun 12, 2020 at 23:32
So, I ended up renaming manually the columns as @pythonic833 suggested (actually giving the names to the 'name' argument in the read_csv function , and then concatenating both dfs with pd.concat. — naccode
– naccode, Commented Jun 16, 2020 at 21:36

BENY · Accepted Answer · 2020-06-12 23:54:00Z

1

We can do

df[['col_2','col_3']]=-np.sort(-df[['col_2','col_3']].values,axis=1)
df
   col_1  col_2  col_3
0      1      9      5
1      2     10      6
2      3     11      7
3      4     12      8

answered Jun 12, 2020 at 23:54

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

MrNobody33 · Accepted Answer · 2020-06-13 20:25:49Z

1

I supposed that all columns names will have at least a number, so, you can order df_two based on the number, and then, rename the columns. You can trysomething like this:

import pandas as pd
import re
df_one = pd.DataFrame({'col_1':[1, 2, 3, 4], 'col_2':[5,6,7,8], 'col_3':[9,10,11,12]})
df_two = pd.DataFrame({'col_1':[1, 2, 3, 4], 'col_3': [9,10,11,12], '2_col':[5, 6, 7, 8]})


print('df_two old:\n\n',df_two,'\n')  

def findnum(col):
    return int(re.findall('\d+',col)[0])

df_two =df_two[sorted(df_two.columns, key=findnum)]
df_two.columns=df_one.columns

print('df_two new: \n')
print(df_two)

Output:

df_two old:

    col_1  col_3  2_col
0      1      9      5
1      2     10      6
2      3     11      7
3      4     12      8 

df_two new: 

   col_1  col_2  col_3
0      1      5      9
1      2      6     10
2      3      7     11
3      4      8     12

If your common paramater is like 'Contract' & 'ContractNum' as you said, you can try something like this:

import pandas as pd
df_one = pd.DataFrame({'Contract':[1, 2, 3, 4], 'Date':[5,6,7,8], 'Provider':[9,10,11,12]})
df_two = pd.DataFrame({'iDate':[1, 2, 3, 4], 'ContractNum': [9,10,11,12], 'nProvider':[5, 6, 7, 8]})

print('df_one:\n', df_one,'\n')
print('df_two:\n', df_two,'\n')

def func(pal):
    for i,val in enumerate(df_one.columns):
        if val.lower() in pal.lower():
            return int(i)

df_two=df_two[sorted(df_two.columns, key=func)]
print('df_two sorted: ')
print(df_two,'\n')
df_two.columns=df_one.columns

print('df_two new colnames: ')
print(df_two,'\n')

Output:

df_one:
    Contract  Date  Provider
0         1     5         9
1         2     6        10
2         3     7        11
3         4     8        12 

df_two:
    iDate  ContractNum  nProvider
0      1            9          5
1      2           10          6
2      3           11          7
3      4           12          8 

df_two sorted: 
   ContractNum  iDate  nProvider
0            9      1          5
1           10      2          6
2           11      3          7
3           12      4          8 

df_two new colnames: 
   Contract  Date  Provider
0         9     1         5
1        10     2         6
2        11     3         7
3        12     4         8

edited Jun 13, 2020 at 20:25

answered Jun 12, 2020 at 23:28

MrNobody33

6,5039 silver badges20 bronze badges

2 Comments

naccode Over a year ago

This is a great answer, but the names were an example (probably bad). The actual names are more like 'Contract' & 'ContractNum', 'Date' & 'iDate', 'Provider' & 'nProvider'.

MrNobody33 Over a year ago

Ok, I just edited my answer to that case, @naccode. Hope it will help.

Umar.H · Accepted Answer · 2020-06-12 23:59:48Z

0

If the numbers are the common parameter between the columns, we can extract them and pass them into the .map function then reassign them using a custom dictionary.

df_two.columns = df_two.columns.str.extract("(\d+)")[0].map(
    {col.split("_")[1]: col for col in df_one.columns}
).tolist()
#{'1': 'col_1', '2': 'col_2', '3': 'col_3'} <- dict
#['col_1', 'col_3', 'col_2'] <- map output that we re-assign.

print(df_two)

   col_1  col_3  col_2
0      1      9      5
1      2     10      6
2      3     11      7
3      4     12      8

then you can merge/concat pd.concat([df_one,df_two])

answered Jun 12, 2020 at 23:59

Umar.H

23.1k7 gold badges50 silver badges94 bronze badges

Collectives™ on Stack Overflow

How to reorder columns in a Pandas dataframe based on other dataframe columns

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related