I am new to pandas. I am trying to fetch a multiple substring from a string. But I need to check between particular start and end.
if it is present i need to get its position, which substring.
I am new to pandas. I am trying to fetch a multiple substring from a string. But I need to check between particular start and end.
if it is present i need to get its position, which substring.
Use str.replace:
target = 'hi|love'
m = df['sequence'].str.contains(target)
df.loc[m, 'output'] = (df.loc[m, 'sequence']
.str.replace(fr'.*({target}).*',
lambda m: f'{m.start(1)+1},{m.group(1)}',
regex=True)
)
df.loc[~m, 'output'] = 'NA'
Output:
sequence output
0 HelloWorld NO
1 worldofhi 8,hi
2 worldoflove 8,love
Used input:
sequence
0 HelloWorld
1 worldofhi
2 worldoflove
target = 'hi|love'
s = df['sequence'].str[7:10+1]
m = s.str.contains(target)
df.loc[m, 'output'] = (s[m]
.str.replace(fr'.*({target}).*',
lambda m: f'{m.start(1)+7+1},{m.group(1)}',
regex=True)
)
df.loc[~m, 'output'] = 'NA'
s[m] .str.replace(fr'.*({target})', lambda m: f'{m.start(1)+7+1},{m.group(1)}',