python sort a list of strings based on substrings using pandas

Question

I have an excel sheet with 4 columns, Filename, SNR, Dynamic Range, Level.

Filename	SNR	Dynamic Range	Level
1___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS8_HPOF.xlsx	5	11	8
19___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS32_HPOF.xlsx	15	31	23
10___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS16_HPOF.xlsx	10	21	24
28___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS48_HPOF.xlsx	20	41	23
37___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS8_HP4.xlsx	25	51	12

I need to reorganize the first column of the table, Xls filename, such that the bolded part is in order from least to greatest. i.e.

Filename	SNR	Dynamic Range	Level
1___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS8_HPOF.xlsx	5	11	8
37___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS8_HP4.xlsx	25	51	12
10___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS16_HPOF.xlsx	10	21	24
19___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS32_HPOF.xlsx	15	31	23
28___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS48_HPOF.xlsx	20	41	23

I don't want to change the actual excel file. I was hoping to use pandas because I am doing some other manipulation later on.

I tried this

df.sort_values(by='Xls Filename', key=lambda col: col.str.contains('_FS'),ascending=True)

but it didn't work.

Thank you in advance!

akuiper · Accepted Answer · 2021-08-09 22:31:41Z

2

Extract the pattern, find the sort index using argsort and then sort with the sort index:

# extract the number to sort by into a Series
fs = df.Filename.str.extract('FS(\d+)_\w+\.xlsx$', expand=False)

# find the sort index using `argsort` and reorder data frame with the sort index
df.loc[fs.astype(int).argsort()]

#                                                                       Filename  ...  Level
#0    1___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS8_HPOF.xlsx  ...      8
#4    37___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS8_HP4.xlsx  ...     12
#2  10___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS16_HPOF.xlsx  ...     24
#1  19___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS32_HPOF.xlsx  ...     23
#3  28___SLATE_FPGA_BESBEV_TX_AMIC_9.6MHz_Normal_IN1_G0_0_HQ_DEC0_FS48_HPOF.xlsx  ...     23

Where regex FS(\d+)_\w+\.xlsx$ will capture digits that immediately follow FS and precede _\w+\.xlsx.

In case you might have patterns that don't match, convert to float instead of int due to possible nans:

df.loc[fs.astype(float).values.argsort()]

edited Aug 9, 2021 at 22:31

answered Aug 9, 2021 at 22:22

akuiper

216k33 gold badges362 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

akuiper Over a year ago

Great. Glad it helps !

Collectives™ on Stack Overflow

python sort a list of strings based on substrings using pandas

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related