2

I a trying parse SI type patterns in another column in a DF or in a list I tried 2 things:

|    a             |
-------------------+
| Builder          |
| left             |
| SI_NAME lide_on  |
| SI_ID 456        |
| Scheduling Info  |

df['b']= df['a'].apply(lambda row: re.findall('\SI_\w+\s',row))  

and

list_DF=[]
for index,row in df.iterrows():
    list_DF.append(re.findall('\SI_\w+\s',row[0]))

I am not able to get the result and the first one returned an empty list in the new column

3
  • Note \S matches a non-whitespace, not S. Commented Apr 12, 2017 at 22:15
  • This regular expression works well on one sentence, fails while iterating over the rows in the dataframe Commented Apr 12, 2017 at 22:20
  • Try df['b'] = df['a'].str.findall(r'^SI_\w+'').apply(','.join) (.apply(','.join) is redundant, but will return just strings). Commented Apr 12, 2017 at 22:27

1 Answer 1

4

You may use something like

df['b'] = df['a'].str.findall(r'^SI_\w+')

Using .str will force the contents to be parsed as string.

The ^SI_\w+ pattern matches SI_ and then 1+ word chars only at the beginning of the string (due to ^) - it looks like the entries you are after follow this pattern. You may add .apply(','.join) or something like that at the end to get string data in the resulting column.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.