0

This is a multipart problem. I have found solutions for each separate part, but when I try to combine these solutions, I don't get the outcome I want.

Let's say this is my dataframe:

df = pd.DataFrame(list(zip([1, 3, 6, 7, 7, 8, 4], [6, 7, 7, 9, 5, 3, 1])), columns = ['Values', 'Vals'])
df

    Values  Vals
0     1     6
1     3     7
2     6     7
3     7     9
4     7     5
5     8     3
6     4     1

Let's say I want to find the pattern [6, 7, 7] in the 'Values' column. I can use a modified version of the second solution given here: Pandas: How to find a particular pattern in a dataframe column?

pattern = [6, 7, 7]

pat_i = [df[i-len(pattern):i] # Get the index 
 for i in range(len(pattern), len(df)) # for each 3 consequent elements 
 if all(df['Values'][i-len(pattern):i] == pattern)] # if the pattern matched
pat_i

[   Values  Vals
 2       6     7
 3       7     9
 4       7     5]

The only way I've found to narrow this down to just index values is the following:

pat_i = [df.index[i-len(pattern):i] # Get the index 
 for i in range(len(pattern), len(df)) # for each 3 consequent elements 
 if all(df['Values'][i-len(pattern):i] == pattern)] # if the pattern matched
pat_i

[RangeIndex(start=2, stop=5, step=1)]

Once I've found the pattern, what I want to do, within the original dataframe, is reorder the pattern to [7, 7, 6], moving the entire associated rows as I do this. In other words, going by the index, I want to get output that looks like this:

df.reindex([0, 1, 3, 4, 2, 5, 6])

    Values  Vals
0     1     6
1     3     7
3     7     9
4     7     5
2     6     7
5     8     3
6     4     1

Then, finally, I want to reset the index so that the values in all the columns stay in the new re-ordered place;

    Values  Vals
0     1     6
1     3     7
2     7     9
3     7     5
4     6     7
5     8     3
6     4     1

In order to use pat_i as a basis for re-ordering, I've tried to modify the second solution given here: Python Pandas: How to move one row to the first row of a Dataframe?

target_row = 2
# Move target row to first element of list.
idx = [target_row] + [i for i in range(len(df)) if i != target_row]

However, I can't figure out how to exploit the pat_i RangeIndex object to use it with this code. The solution, when I find it, will be applied to hundreds of dataframes, each one of which will contain the [6, 7, 7] pattern that needs to be re-ordered in one place, but not the same place in each dataframe.

Any help appreciated...and I'm sure there must be an elegant, pythonic way of doing this, as it seems like it should be a common enough challenge. Thank you.

1 Answer 1

1

I just sort of rewrote your code. I held the first and last indexes to the side, reordered the indexes of interest, and put everything together in a new index. Then I just use the new index to reorder the data.

import pandas as pd
from pandas import RangeIndex

df = pd.DataFrame(list(zip([1, 3, 6, 7, 7, 8, 4], [6, 7, 7, 9, 5, 3, 1])), columns = ['Values', 'Vals'])
pattern = [6, 7, 7]
new_order = [1, 2, 0] # new order of pattern

for i in list(df[df['Values'] == pattern[0]].index):
    if all(df['Values'][i:i+len(pattern)] == pattern):
        pat_i = df[i:i+len(pattern)]
front_ind = list(range(0, pat_i.index[0]))
back_ind = list(range(pat_i.index[-1]+1, len(df)))
pat_ind = [pat_i.index[i] for i in new_order]
new_ind = front_ind + pat_ind + back_ind
df = df.loc[new_ind].reset_index(drop=True)

df
Out[82]: 
   Values  Vals
0       1     6
1       3     7
2       7     9
3       7     5
4       6     7
5       8     3
6       4     1
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks. It looks like this should work, but it does use a for loop, which I would have preferred to avoid. This is just a sample dataframe; the "real life" ones I will be using the code on are much, much bigger, and there are hundreds of them, so maybe this would be quite slow. But thanks again; your solution might inspire me to something else.
Remember, generally .apply is faster than for, using pandas vectors is faster than .apply, and numpy is faster than everything.
If you want speed I'll update my answer to anchor the looping in particular values. For example, in this new answer only one loop is executed.
Thanks. I got a version of your solution to work on a sample real dataframe, but for some reason only by modifying this line: "if all(df['Values'][i:i+len(pattern)] == pattern)" to this: "if all(df['Family'][i-len(pattern):i] == pattern)"
Well whatever works works, although your modification implies that the pattern you are supplying is backwards. Just make sure that you make the changes through out (e.g. changing this: pat_i = df[i:i+len(pattern)]).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.