1

I have 2 dataframes df and df1 and both are having file paths like this.

    df = pd.DataFrame({"X1": ['f','f','o','o','b','b'],
"X2": ['fb/FOO1/bar0.wav','fb/FOO1/bar1.wav','fb/FOO2/bar2.wav','fb/FOO2/bar3.wav','fb/FOO3/bar4.wav','fb/FOO3/bar5.wav']})


    X1  X2
0   f   fb/FOO1/bar0.wav
1   f   fb/FOO1/bar1.wav
2   o   fb/FOO2/bar2.wav
3   o   fb/FOO2/bar3.wav
4   b   fb/FOO3/bar4.wav
5   b   fb/FOO3/bar5.wav

and another dataframe,

 df1 = pd.DataFrame({"X1": ['b','o','b','f','o','f'],
"X2": ['fb1/FOO3/bar5.opus','fb1/FOO2/bar2.opus','fb1/FOO3/bar4.opus','fb1/FOO1/bar1.opus','fb1/FOO2/bar3.opus','fb1/FOO1/bar0.opus']})

    X1  X2
0   b   fb1/FOO3/bar5.opus
1   o   fb1/FOO2/bar2.opus
2   b   fb1/FOO3/bar4.opus
3   f   fb1/FOO1/bar1.opus
4   o   fb1/FOO2/bar3.opus
5   f   fb1/FOO1/bar0.opus

Now I want to sort the 2nd dataframe df1's X2 column (filepath) according to the first dataframe df's filepaths. Such that, output should like this

    X1  X2
0   f   fb1/FOO1/bar0.opus
1   f   fb1/FOO1/bar1.opus
2   o   fb1/FOO2/bar2.opus
3   o   fb1/FOO2/bar3.opus
4   b   fb1/FOO3/bar4.opus
5   b   fb1/FOO3/bar5.opus
4
  • 1
    This just looks like: df1.sort_values('X2') or am I missing something? Commented Oct 7, 2020 at 16:33
  • No, it won't work because the root directory is different in both dataframes, see fb and fb1 in df and df1 respectively. @Erfan Commented Oct 7, 2020 at 16:34
  • ok, so filepaths are different in both dfs. So what is the key to sort? The bar part of the filepath? Commented Oct 7, 2020 at 16:38
  • Yes, both file paths are different but 'FOO and bar' part for both dataframes are same, I need to sort them according to middle 2 (FOO and bar) part. @SebastienD Commented Oct 7, 2020 at 16:41

2 Answers 2

1

You might create a sorter dictionnary which would allow you to sort your values with a custom key:

#the following is creating a key with the name part of the filepath (could have been done with regex)
sorter_dict = dict(zip(df.X2.apply(lambda x : x.split('/')[-1].split('.')[0]),df.index))
#{'bar0': 0, 'bar1': 1, 'bar2': 2, 'bar3': 3, 'bar4': 4, 'bar5': 5}

#on df1, let's create a temp col with the name part of the filepath
df1['temp'] = df1.X2.apply(lambda x : x.split('/')[-1].split('.')[0])
#and apply our sorter dict
df1['sorter'] = df1.temp.map(sorter_dict)
#at the end, simply sort
df1 = df1.sort_values('sorter')
#and delete unecessary cols
del df1['temp'], df1['sorter']

Output

| X1   | X2                 |
|:-----|:-------------------|
| f    | fb1/FOO1/bar0.opus |
| f    | fb1/FOO1/bar1.opus |
| o    | fb1/FOO2/bar2.opus |
| o    | fb1/FOO2/bar3.opus |
| b    | fb1/FOO3/bar4.opus |
| b    | fb1/FOO3/bar5.opus |
Sign up to request clarification or add additional context in comments.

Comments

1

This could work if the file path names are a consistent length within the dataframes. Simply create a new column with the part that you want to sort-by, sort-by that column and then drop the new column:

df['X3'] = df['X2'].astype(str).str[3:-4]
df1['X3'] = df1['X2'].astype(str).str[4:-5]

df1 = df1.set_index('X3')
df1 = df1.reindex(index=df['X3'])
df1 = df1.reset_index()

df1 = df1.drop('X3', axis = 1)
df = df.drop('X3', axis = 1)

df1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.