0

Suppose I have a df:

df = pd.DataFrame({'col': ['ABCXDEF', 'ABCYDEF']})

How can I extract the string that is surrounded by ABC & the first occurrence of DEF? Desired output:

    col
0   X
1   Y

Note that I don't want a solution based on exact positions, like:

df.col.str[3:4]
2
  • 4
    df.col.str.extract(r"((?<=ABC).+(?=DEF))") Commented Jun 18, 2020 at 9:16
  • Sorry my bad. Incomplete question: but it needs to be the first occurence of DEF. For example, if the string is ABCXDEFDEFDEF. Commented Jun 18, 2020 at 9:18

1 Answer 1

3

(update: look for the first occurrence of 'DEF') Use this regex:

df = pd.DataFrame({'col': ['ABCXDEF', 'ABCYDEFDEFDEF']})
print(df.col.str.extract(r"ABC(.*?)DEF"))

The result is:

   0
0  X
1  Y
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.