Sort dataframe column according to another dataframe's column

Question

I have 2 dataframes df and df1 and both are having file paths like this.

    df = pd.DataFrame({"X1": ['f','f','o','o','b','b'],
"X2": ['fb/FOO1/bar0.wav','fb/FOO1/bar1.wav','fb/FOO2/bar2.wav','fb/FOO2/bar3.wav','fb/FOO3/bar4.wav','fb/FOO3/bar5.wav']})


    X1  X2
0   f   fb/FOO1/bar0.wav
1   f   fb/FOO1/bar1.wav
2   o   fb/FOO2/bar2.wav
3   o   fb/FOO2/bar3.wav
4   b   fb/FOO3/bar4.wav
5   b   fb/FOO3/bar5.wav

and another dataframe,

 df1 = pd.DataFrame({"X1": ['b','o','b','f','o','f'],
"X2": ['fb1/FOO3/bar5.opus','fb1/FOO2/bar2.opus','fb1/FOO3/bar4.opus','fb1/FOO1/bar1.opus','fb1/FOO2/bar3.opus','fb1/FOO1/bar0.opus']})

    X1  X2
0   b   fb1/FOO3/bar5.opus
1   o   fb1/FOO2/bar2.opus
2   b   fb1/FOO3/bar4.opus
3   f   fb1/FOO1/bar1.opus
4   o   fb1/FOO2/bar3.opus
5   f   fb1/FOO1/bar0.opus

Now I want to sort the 2nd dataframe df1's X2 column (filepath) according to the first dataframe df's filepaths. Such that, output should like this

    X1  X2
0   f   fb1/FOO1/bar0.opus
1   f   fb1/FOO1/bar1.opus
2   o   fb1/FOO2/bar2.opus
3   o   fb1/FOO2/bar3.opus
4   b   fb1/FOO3/bar4.opus
5   b   fb1/FOO3/bar5.opus

This just looks like: df1.sort_values('X2') or am I missing something? — Erfan
– Erfan, Commented Oct 7, 2020 at 16:33
No, it won't work because the root directory is different in both dataframes, see fb and fb1 in df and df1 respectively. @Erfan — adikh
– adikh, Commented Oct 7, 2020 at 16:34
ok, so filepaths are different in both dfs. So what is the key to sort? The bar part of the filepath? — Sebastien D
– Sebastien D, Commented Oct 7, 2020 at 16:38
Yes, both file paths are different but 'FOO and bar' part for both dataframes are same, I need to sort them according to middle 2 (FOO and bar) part. @SebastienD — adikh
– adikh, Commented Oct 7, 2020 at 16:41

Sebastien D · Accepted Answer · 2020-10-07 16:47:13Z

You might create a sorter dictionnary which would allow you to sort your values with a custom key:

#the following is creating a key with the name part of the filepath (could have been done with regex)
sorter_dict = dict(zip(df.X2.apply(lambda x : x.split('/')[-1].split('.')[0]),df.index))
#{'bar0': 0, 'bar1': 1, 'bar2': 2, 'bar3': 3, 'bar4': 4, 'bar5': 5}

#on df1, let's create a temp col with the name part of the filepath
df1['temp'] = df1.X2.apply(lambda x : x.split('/')[-1].split('.')[0])
#and apply our sorter dict
df1['sorter'] = df1.temp.map(sorter_dict)
#at the end, simply sort
df1 = df1.sort_values('sorter')
#and delete unecessary cols
del df1['temp'], df1['sorter']

Output

| X1   | X2                 |
|:-----|:-------------------|
| f    | fb1/FOO1/bar0.opus |
| f    | fb1/FOO1/bar1.opus |
| o    | fb1/FOO2/bar2.opus |
| o    | fb1/FOO2/bar3.opus |
| b    | fb1/FOO3/bar4.opus |
| b    | fb1/FOO3/bar5.opus |

Joel Leeb-du Toit · Accepted Answer · 2020-10-07 16:55:03Z

1

This could work if the file path names are a consistent length within the dataframes. Simply create a new column with the part that you want to sort-by, sort-by that column and then drop the new column:

df['X3'] = df['X2'].astype(str).str[3:-4]
df1['X3'] = df1['X2'].astype(str).str[4:-5]

df1 = df1.set_index('X3')
df1 = df1.reindex(index=df['X3'])
df1 = df1.reset_index()

df1 = df1.drop('X3', axis = 1)
df = df.drop('X3', axis = 1)

df1

answered Oct 7, 2020 at 16:55

Joel Leeb-du Toit

1715 bronze badges

Collectives™ on Stack Overflow

Sort dataframe column according to another dataframe's column

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related