Iterating through dataframe using regular expression python

Question

I a trying parse SI type patterns in another column in a DF or in a list I tried 2 things:

|    a             |
-------------------+
| Builder          |
| left             |
| SI_NAME lide_on  |
| SI_ID 456        |
| Scheduling Info  |

df['b']= df['a'].apply(lambda row: re.findall('\SI_\w+\s',row))

and

list_DF=[]
for index,row in df.iterrows():
    list_DF.append(re.findall('\SI_\w+\s',row[0]))

I am not able to get the result and the first one returned an empty list in the new column

This regular expression works well on one sentence, fails while iterating over the rows in the dataframe — Shweta Kamble
– Shweta Kamble, Commented Apr 12, 2017 at 22:20
Try df['b'] = df['a'].str.findall(r'^SI_\w+'').apply(','.join) (.apply(','.join) is redundant, but will return just strings). — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Apr 12, 2017 at 22:27

Wiktor Stribiżew · Accepted Answer · 2017-04-12 22:47:19Z

4

You may use something like

df['b'] = df['a'].str.findall(r'^SI_\w+')

Using .str will force the contents to be parsed as string.

The ^SI_\w+ pattern matches SI_ and then 1+ word chars only at the beginning of the string (due to ^) - it looks like the entries you are after follow this pattern. You may add .apply(','.join) or something like that at the end to get string data in the resulting column.

answered Apr 12, 2017 at 22:47

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Iterating through dataframe using regular expression python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related