0

I have a pandas Series where I have to extract all the substrings within parenthesis. A string might contain multiple such substrings as well as no such substrings as well. How can such a condition be handled

abc(def)ghi(jkl)aaa
jklmnopqr(jkl)
(ab)cde(ghi)
lmnoprst uvwxyz

If I use str.extract, I can obtain only one substring at a time from a string with a.str.extract('.*\((.*)\)'). So in effect, I miss the substring def.

How can this be solved.?

The desired outcome is

def
jkl
ab
ghi
4
  • 1
    Have you tried str.extractall? see official docs Commented Oct 8, 2018 at 14:53
  • @mr_mo I am getting asserion error 1 columns passed, passed data had 19 columns Commented Oct 8, 2018 at 14:59
  • 1
    Please share code so I can help you debug. Commented Oct 8, 2018 at 15:01
  • @mr_mo- It was negligence from my side. I was using pandas 0.18. After updating to pandas 0.23 everything works fine as per the accepted answer below. Apologies from my side. Commented Oct 8, 2018 at 15:12

1 Answer 1

2

Try:

df[0].str.extractall(r'\((\w+)\)')

Output:

           0
  match     
0 0      def
  1      jkl
1 0      jkl
2 0       ab
  1      ghi
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.