1

I have a dataframe column as follows:

df['col1']

['cat-dog asd-pwr sdf', 'cat-goat asd-pwr2 sdf', 'cat asd-pwr3 sdf']

I need to extract the following:

['asd-pwr', 'asd-pwr2', 'asd-pwr3']

i.e the last pair of substrings which are connected by -

I tried the following:

import re
df['col1'].str.extract(r'\s[a-zA-Z]*-[a-zA-Z]*\s', flags=re.IGNORECASE)

First of all, my regex construct even fails to spot any pair of substrings as desired.

0

3 Answers 3

1

You can use

import pandas as pd
df = pd.DataFrame({'col1': ['cat-dog asd-pwr sdf', 'cat-goat asd-pwr2 sdf', 'cat asd-pwr3 sdf']})
>>> df['col1'].str.extract(r'(?:.*\W)?(\w+-\w+)')
          0
0   asd-pwr
1  asd-pwr2
2  asd-pwr3

Or, if there can be start of string or whitespace on the left, you may also use

r'(?:.*\s)?(\w+-\w+)'

Details:

  • (?:.*\W)? - an optional sequence of any zero or more chars other than line break chars, as many as possibel, then a non-word char (\s matches a whitespace)
  • (\w+-\w+) - Group 1: one or more word chars, - and one or more word chars.

As .* is greedy, the last part of the pattern between round brackets (aka capturing parentheses) gets the last occurrence of hyphenated words.

Sign up to request clarification or add additional context in comments.

Comments

1

You can use:

import re

df['col1'].str.extract(r'\s*(\w+-\w+)(?!.*-)\s*', flags=re.IGNORECASE)

Here, we use \w instead of [a-zA-Z] because you also want to extract the number after pwr.

We also use negative lookahead (?!.*-) to ensure the current matching substring is the last substring with hyphen - in the string.

Result:

          0
0   asd-pwr
1  asd-pwr2
2  asd-pwr3

Comments

1

This regex should do the trick

\w*-\w*(?=(\s|$)\w*.*$)

Only take the last object from the resulting match array.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.