How to extract string in pandas with regex?

Question

I have a dataframe:

swimming(4) 1 4 hiking (1 ) 2 2 running ( 12 ) 3 5 fishing( 2 )

|     |  sid  | Hobby (times per month) |
|-----+-------+-------------------------|
|  0  |   3   |      swimming(4)        |
|-----+-------+-------------------------|
|  1  |   4   |      hiking  (1 )       |
|-----+-------+-------------------------|
|  2  |   2   |      running ( 12 )     |
|-----+-------+-------------------------|
|  3  |   5   |      fishing ( 2 )      |

How to extract strings by removing the brackets in the second column as:

|     |  sid  | Hobby (times per month) |
|-----+-------+-------------------------|
|  0  |   3   |        swimming         |
|-----+-------+-------------------------|
|  1  |   4   |        hiking           |
|-----+-------+-------------------------|
|  2  |   2   |        running          |
|-----+-------+-------------------------|
|  3  |   5   |        fishing          |

If it is always the structure hobby_xyz (n_times) then you can split the string on ( and just keep the first element. — Mr. T
– Mr. T, Commented Jun 27, 2018 at 9:47

Aman Chhabra · Accepted Answer · 2018-06-27 09:43:41Z

1

If you want for example, swimming(4) to be changed to swimming, you can use below regex:

^([\w]+)[\s]*\([\s]*[\d]*[\s]*\)[\s]*$

Demo: https://regex101.com/r/sTO1Q9/1

Test Cases:

swimming(4)
hiking   (1 )
running ( 12 )
fishing( 2 )
hiking(1)

Match:

Match 1
Full match  0-11    `swimming(4)`
Group 1.    0-8 `swimming`
Match 2
Full match  12-25   `hiking   (1 )`
Group 1.    12-18   `hiking`
Match 3
Full match  26-40   `running ( 12 )`
Group 1.    26-33   `running`
Match 4
Full match  41-53   `fishing( 2 )`
Group 1.    41-48   `fishing`
Match 5
Full match  54-64   `hiking(1) `
Group 1.    54-60   `hiking`

answered Jun 27, 2018 at 9:43

Aman Chhabra

3,9241 gold badge25 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Joey Gao · Accepted Answer · 2018-06-27 09:57:36Z

1

You can use 'str' method to match the string in pandas

df.columns = ['sid','Hobby']
df.Hobby = df.Hobby.str.extract(r'(\w*)')

edited Jun 27, 2018 at 9:57

answered Jun 27, 2018 at 9:48

Joey Gao

9792 gold badges9 silver badges15 bronze badges

Comments

Simas Joneliunas · Accepted Answer · 2018-06-27 10:15:46Z

0

to implement the regex in pandas you can use pandas.apply():

import re

def remove_brackets(string):
    part = regexp_matcher.findall(string)
    if not part:
        return string
    return part[0]

regexp_matcher = re.compile(r'^([\w]+)[\s]*\([\s]*[\d]*[\s]*\)[\s]*$')
df = pd.DataFrame()
df['string'] = ['swimming(4)', 'swimming(4)', 'swimming(4)']    
df['new_string'] = df['string'].apply(remove_brackets)

answered Jun 27, 2018 at 10:15

Simas Joneliunas

3,15620 gold badges32 silver badges39 bronze badges

Collectives™ on Stack Overflow

How to extract string in pandas with regex?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related