0

Suppose these dataframes:

import pandas as pd

df_one = pd.DataFrame({'col_1':[1, 2, 3, 4], 'col_2':[5,6,7,8], 'col_3':[9,10,11,12]})
df_two = pd.DataFrame({'col_1':[1, 2, 3, 4], 'col_3': [9,10,11,12], '2_col':[5, 6, 7, 8]})

In reality these dataframes come from different txt files so the concept of each column is the same but the order of columns is not, and some of the columns have a slightly different name. Both datasets have 33 columns representing the same concepts but in different order.

How can I reorder the second df with the same structure as the first df? Meaning same order of columns and same column names as df_one...

The final objective is to merge both df into a single consolidated one.

I have tried this:

cols = df_one.columns.to_list()  # get columns names from df_one
df_two = df_two.reindex(columns=cols)

but this gets NaN values in 'col_2':

col_1   col_2   col_3
0   1   NaN 9
1   2   NaN 10
2   3   NaN 11
3   4   NaN 12

I also tried to first change col names in df_two and then reorder:

df_two.columns = cols
df_two = df_two.reindex(columns=cols)

but this also is wrong (col_2 now have the values of col_3):

col_1   col_2   col_3
0   1   9   5
1   2   10  6
2   3   11  7
3   4   12  8

Thanks for your suggestions.

EDIT BASED ON COMMENTS:

Actual Column names are more like: 'Date' & 'iDate', 'Contract' & 'nContract', 'Premium' & 'iPremium'. I exemplified with numbers in the question (probably bad idea), but correlated numbers are not part of the names.

How can I map the order of columns in df_two ? (say, col 1 of df_1 is the same as col 1 in df_2, col 2 of df_1 is col_3 of df_2, col_3 of df_1 is col_2 of df_2) - And then I would rename the columns in df_2 as in df_1.

3
  • 4
    First rename columns, so that df_one and df_two have the same column names (using df_one.rename(columns={'col_one':'col_two', ...})). Then df_one[df_two.columns] will do the job. Commented Jun 12, 2020 at 23:18
  • what is the commonality between the columns, are they all numbered ? Commented Jun 12, 2020 at 23:32
  • So, I ended up renaming manually the columns as @pythonic833 suggested (actually giving the names to the 'name' argument in the read_csv function , and then concatenating both dfs with pd.concat. Commented Jun 16, 2020 at 21:36

3 Answers 3

1

We can do

df[['col_2','col_3']]=-np.sort(-df[['col_2','col_3']].values,axis=1)
df
   col_1  col_2  col_3
0      1      9      5
1      2     10      6
2      3     11      7
3      4     12      8
Sign up to request clarification or add additional context in comments.

Comments

1

I supposed that all columns names will have at least a number, so, you can order df_two based on the number, and then, rename the columns. You can trysomething like this:

import pandas as pd
import re
df_one = pd.DataFrame({'col_1':[1, 2, 3, 4], 'col_2':[5,6,7,8], 'col_3':[9,10,11,12]})
df_two = pd.DataFrame({'col_1':[1, 2, 3, 4], 'col_3': [9,10,11,12], '2_col':[5, 6, 7, 8]})


print('df_two old:\n\n',df_two,'\n')  

def findnum(col):
    return int(re.findall('\d+',col)[0])

df_two =df_two[sorted(df_two.columns, key=findnum)]
df_two.columns=df_one.columns

print('df_two new: \n')
print(df_two)

Output:

df_two old:

    col_1  col_3  2_col
0      1      9      5
1      2     10      6
2      3     11      7
3      4     12      8 

df_two new: 

   col_1  col_2  col_3
0      1      5      9
1      2      6     10
2      3      7     11
3      4      8     12

If your common paramater is like 'Contract' & 'ContractNum' as you said, you can try something like this:

import pandas as pd
df_one = pd.DataFrame({'Contract':[1, 2, 3, 4], 'Date':[5,6,7,8], 'Provider':[9,10,11,12]})
df_two = pd.DataFrame({'iDate':[1, 2, 3, 4], 'ContractNum': [9,10,11,12], 'nProvider':[5, 6, 7, 8]})

print('df_one:\n', df_one,'\n')
print('df_two:\n', df_two,'\n')

def func(pal):
    for i,val in enumerate(df_one.columns):
        if val.lower() in pal.lower():
            return int(i)

df_two=df_two[sorted(df_two.columns, key=func)]
print('df_two sorted: ')
print(df_two,'\n')
df_two.columns=df_one.columns

print('df_two new colnames: ')
print(df_two,'\n')

Output:

df_one:
    Contract  Date  Provider
0         1     5         9
1         2     6        10
2         3     7        11
3         4     8        12 

df_two:
    iDate  ContractNum  nProvider
0      1            9          5
1      2           10          6
2      3           11          7
3      4           12          8 

df_two sorted: 
   ContractNum  iDate  nProvider
0            9      1          5
1           10      2          6
2           11      3          7
3           12      4          8 

df_two new colnames: 
   Contract  Date  Provider
0         9     1         5
1        10     2         6
2        11     3         7
3        12     4         8

2 Comments

This is a great answer, but the names were an example (probably bad). The actual names are more like 'Contract' & 'ContractNum', 'Date' & 'iDate', 'Provider' & 'nProvider'.
Ok, I just edited my answer to that case, @naccode. Hope it will help.
0

If the numbers are the common parameter between the columns, we can extract them and pass them into the .map function then reassign them using a custom dictionary.

df_two.columns = df_two.columns.str.extract("(\d+)")[0].map(
    {col.split("_")[1]: col for col in df_one.columns}
).tolist()
#{'1': 'col_1', '2': 'col_2', '3': 'col_3'} <- dict
#['col_1', 'col_3', 'col_2'] <- map output that we re-assign.

print(df_two)

   col_1  col_3  col_2
0      1      9      5
1      2     10      6
2      3     11      7
3      4     12      8

then you can merge/concat pd.concat([df_one,df_two])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.