Find pattern in pandas dataframe, reorder it row-wise, and reset index

Question

This is a multipart problem. I have found solutions for each separate part, but when I try to combine these solutions, I don't get the outcome I want.

Let's say this is my dataframe:

df = pd.DataFrame(list(zip([1, 3, 6, 7, 7, 8, 4], [6, 7, 7, 9, 5, 3, 1])), columns = ['Values', 'Vals'])
df

    Values  Vals
0     1     6
1     3     7
2     6     7
3     7     9
4     7     5
5     8     3
6     4     1

Let's say I want to find the pattern [6, 7, 7] in the 'Values' column. I can use a modified version of the second solution given here: Pandas: How to find a particular pattern in a dataframe column?

pattern = [6, 7, 7]

pat_i = [df[i-len(pattern):i] # Get the index 
 for i in range(len(pattern), len(df)) # for each 3 consequent elements 
 if all(df['Values'][i-len(pattern):i] == pattern)] # if the pattern matched
pat_i

[   Values  Vals
 2       6     7
 3       7     9
 4       7     5]

The only way I've found to narrow this down to just index values is the following:

pat_i = [df.index[i-len(pattern):i] # Get the index 
 for i in range(len(pattern), len(df)) # for each 3 consequent elements 
 if all(df['Values'][i-len(pattern):i] == pattern)] # if the pattern matched
pat_i

[RangeIndex(start=2, stop=5, step=1)]

Once I've found the pattern, what I want to do, within the original dataframe, is reorder the pattern to [7, 7, 6], moving the entire associated rows as I do this. In other words, going by the index, I want to get output that looks like this:

df.reindex([0, 1, 3, 4, 2, 5, 6])

    Values  Vals
0     1     6
1     3     7
3     7     9
4     7     5
2     6     7
5     8     3
6     4     1

Then, finally, I want to reset the index so that the values in all the columns stay in the new re-ordered place;

    Values  Vals
0     1     6
1     3     7
2     7     9
3     7     5
4     6     7
5     8     3
6     4     1

In order to use pat_i as a basis for re-ordering, I've tried to modify the second solution given here: Python Pandas: How to move one row to the first row of a Dataframe?

target_row = 2
# Move target row to first element of list.
idx = [target_row] + [i for i in range(len(df)) if i != target_row]

However, I can't figure out how to exploit the pat_i RangeIndex object to use it with this code. The solution, when I find it, will be applied to hundreds of dataframes, each one of which will contain the [6, 7, 7] pattern that needs to be re-ordered in one place, but not the same place in each dataframe.

Any help appreciated...and I'm sure there must be an elegant, pythonic way of doing this, as it seems like it should be a common enough challenge. Thank you.

itwasthekix · Accepted Answer · 2021-01-27 17:07:24Z

1

I just sort of rewrote your code. I held the first and last indexes to the side, reordered the indexes of interest, and put everything together in a new index. Then I just use the new index to reorder the data.

import pandas as pd
from pandas import RangeIndex

df = pd.DataFrame(list(zip([1, 3, 6, 7, 7, 8, 4], [6, 7, 7, 9, 5, 3, 1])), columns = ['Values', 'Vals'])
pattern = [6, 7, 7]
new_order = [1, 2, 0] # new order of pattern

for i in list(df[df['Values'] == pattern[0]].index):
    if all(df['Values'][i:i+len(pattern)] == pattern):
        pat_i = df[i:i+len(pattern)]
front_ind = list(range(0, pat_i.index[0]))
back_ind = list(range(pat_i.index[-1]+1, len(df)))
pat_ind = [pat_i.index[i] for i in new_order]
new_ind = front_ind + pat_ind + back_ind
df = df.loc[new_ind].reset_index(drop=True)

df
Out[82]: 
   Values  Vals
0       1     6
1       3     7
2       7     9
3       7     5
4       6     7
5       8     3
6       4     1

edited Jan 27, 2021 at 17:07

answered Jan 27, 2021 at 1:08

itwasthekix

6157 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

cjstevens Over a year ago

Thanks. It looks like this should work, but it does use a for loop, which I would have preferred to avoid. This is just a sample dataframe; the "real life" ones I will be using the code on are much, much bigger, and there are hundreds of them, so maybe this would be quite slow. But thanks again; your solution might inspire me to something else.

itwasthekix Over a year ago

Remember, generally .apply is faster than for, using pandas vectors is faster than .apply, and numpy is faster than everything.

itwasthekix Over a year ago

If you want speed I'll update my answer to anchor the looping in particular values. For example, in this new answer only one loop is executed.

cjstevens Over a year ago

Thanks. I got a version of your solution to work on a sample real dataframe, but for some reason only by modifying this line: "if all(df['Values'][i:i+len(pattern)] == pattern)" to this: "if all(df['Family'][i-len(pattern):i] == pattern)"

itwasthekix Over a year ago

Well whatever works works, although your modification implies that the pattern you are supplying is backwards. Just make sure that you make the changes through out (e.g. changing this: pat_i = df[i:i+len(pattern)]).

Collectives™ on Stack Overflow

Find pattern in pandas dataframe, reorder it row-wise, and reset index

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related