I need to extract rows based on 3 conditions:
the column
col1should contain all the words in the list list_words.the first row should end with the word
Storythe next rows should end with
ac
I've managed to make it work with the help of this question Extract rows based on conditions Pandas Python , but the problem is that I need to extract every row that ends with Story and the rows after that rows that end with ac.
this is my current code:
import pandas as pd
df = pd.DataFrame({'col1': ['Draft SW Quality Assurance Plan Story', 'alex ac', 'anny ac', 'antoine ac','aze epic', 'bella ac', 'Complete SW Quality Assurance Plan Story', 'celine ac','wqas epic', 'karmen ac', 'kameilia ac', 'Update SW Quality Assurance Plan Story', 'joseph ac','Update SW Quality Assurance Plan ac', 'joseph ac'],
'col2': ['aa', 'bb', 'cc', 'dd','ee', 'ff', 'gg', 'hh', 'ii', 'jj', 'kk', 'll', 'mm', 'nn', 'oo']})
print(df)
list_words="SW Quality Plan Story"
set_words = set(list_words.split())
df["Suffix"] = df.col1.apply(lambda x: x.split()[-1])
# Condition 1: all words in col1 minus all words in set_words must be empty
df["condition_1"] = df.col1.apply(lambda x: not bool(set_words - set(x.split())))
# Condition 2: the last word should be 'Story'
df["condition_2"] = df.col1.str.endswith("Story")
# Condition 3: the last word in the next row should be ac. See `shift(-1)`
df["condition_3"] = df.col1.str.endswith("ac").shift(-1)
# Condition 3: the last word in the next row should be ac. See `shift(-1)`
df["condition_4"] = df.col1.str.endswith("ac")
# When all three conditions meet: new column 'conditions'
df["conditions"] = df.condition_1 & df.condition_2 & df.condition_3
df["conditions&"] = df.conditions | df.conditions.shift(1)
print(df[['condition_1', 'condition_2','condition_3' ,'condition_4']])
df.to_excel('cond.xlsx', 'Sheet1', index=True)
df["TrueFalse"] = df.conditions | df.conditions.shift(1)
df1=df[["col1", "col2", "TrueFalse", "Suffix"]][df.TrueFalse]
print(df1)
this is my output:
0 Draft SW Quality Assurance Plan Story aa True Story
1 alex ac bb True ac
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh True ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm True ac
this is the desired output:
0 Draft SW Quality Assurance Plan Story aa True Story
1 alex ac bb True ac
2 anny ac cc True ac
3 antoine ac dd True ac
6 Complete SW Quality Assurance Plan Story gg True Story
7 celine ac hh True ac
11 Update SW Quality Assurance Plan Story ll True Story
12 joseph ac mm True ac
13 Update SW Quality Assurance Plan ac nn True ac
14 joseph ac oo True ac
I need to extract all the rows that end with ac after the row that ends with Story( 2nd and 3rd row included), not just the first one.
Is it doable?
list_words( it ends withacinstead ofStory) but you're right, it should be in my desired output