1

I have a dataframe:

swimming(4) 1 4 hiking (1 ) 2 2 running ( 12 ) 3 5 fishing( 2 )

|     |  sid  | Hobby (times per month) |
|-----+-------+-------------------------|
|  0  |   3   |      swimming(4)        |
|-----+-------+-------------------------|
|  1  |   4   |      hiking  (1 )       |
|-----+-------+-------------------------|
|  2  |   2   |      running ( 12 )     |
|-----+-------+-------------------------|
|  3  |   5   |      fishing ( 2 )      |

How to extract strings by removing the brackets in the second column as:

|     |  sid  | Hobby (times per month) |
|-----+-------+-------------------------|
|  0  |   3   |        swimming         |
|-----+-------+-------------------------|
|  1  |   4   |        hiking           |
|-----+-------+-------------------------|
|  2  |   2   |        running          |
|-----+-------+-------------------------|
|  3  |   5   |        fishing          |
2
  • Have a look at apply (or was it map?) Commented Jun 27, 2018 at 9:39
  • If it is always the structure hobby_xyz (n_times) then you can split the string on ( and just keep the first element. Commented Jun 27, 2018 at 9:47

3 Answers 3

1

If you want for example, swimming(4) to be changed to swimming, you can use below regex:

^([\w]+)[\s]*\([\s]*[\d]*[\s]*\)[\s]*$

Demo: https://regex101.com/r/sTO1Q9/1

Test Cases:

swimming(4)
hiking   (1 )
running ( 12 )
fishing( 2 )
hiking(1) 

Match:

Match 1
Full match  0-11    `swimming(4)`
Group 1.    0-8 `swimming`
Match 2
Full match  12-25   `hiking   (1 )`
Group 1.    12-18   `hiking`
Match 3
Full match  26-40   `running ( 12 )`
Group 1.    26-33   `running`
Match 4
Full match  41-53   `fishing( 2 )`
Group 1.    41-48   `fishing`
Match 5
Full match  54-64   `hiking(1) `
Group 1.    54-60   `hiking`
Sign up to request clarification or add additional context in comments.

Comments

1

You can use 'str' method to match the string in pandas

df.columns = ['sid','Hobby']
df.Hobby = df.Hobby.str.extract(r'(\w*)')

Comments

0

to implement the regex in pandas you can use pandas.apply():

import re

def remove_brackets(string):
    part = regexp_matcher.findall(string)
    if not part:
        return string
    return part[0]

regexp_matcher = re.compile(r'^([\w]+)[\s]*\([\s]*[\d]*[\s]*\)[\s]*$')
df = pd.DataFrame()
df['string'] = ['swimming(4)', 'swimming(4)', 'swimming(4)']    
df['new_string'] = df['string'].apply(remove_brackets)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.