How can I replace substring from string by a list in a column dataframe?

Question

I need to replace substrings in a column value in dataframe

Example: I have this column 'code' in a dataframe (in really, the dataframe is very large)

3816R(motor) #I need '3816R'
97224(Eletro)
502812(Defletor)
97252(Defletor)
97525(Eletro)
5725 ( 56)

And I have this list to replace the values:

list = ['(motor)', '(Eletro)', '(Defletor)', '(Eletro)', '( 56)']

I've tried a lot of methods, like:

df['code'] = df['code'].str.replace(list, '')

And regex= True, but anyone method worked to remove the substrings.

How can I do that?

Can you have cases in which there are parentheses with something to keep? It will be more efficient to handle a generic case — mozway
– mozway, Commented Feb 2, 2023 at 17:31

Pinyi Wang · Accepted Answer · 2023-02-02 17:49:19Z

2

You can try regex replace and regex or condition: https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html https://www.ocpsoft.org/tutorials/regular-expressions/or-in-regex/

l = ['(motor)', '(Eletro)', '(Defletor)', '( 56)']
l = [s.replace('(', '\(').replace(')', '\)') for s in l]
regex_str = f"({'|'.join(l)})"
df['code'] = df['code'].str.replace(regex_str, '', regex=True)

The regex_str will end up with something like

"(\(motor\)|\(Eletro\)|\(Defletor\)|\( 56\))"

edited Feb 2, 2023 at 17:49

answered Feb 2, 2023 at 17:18

Pinyi Wang

8726 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

João Felipe Holanda Over a year ago

I need to pass the list as an argument, how can I?

Pinyi Wang Over a year ago

Do you just want to remove those strings in the list or any strings that's within parenthesis?

João Felipe Holanda Over a year ago

Only the string in the list

Rabinzel Over a year ago

why do you need to pass the list as an argument? Can you explain why that is?

Pinyi Wang Over a year ago

@JoãoFelipeHolanda in that case, you and create the regex string with "or" condition based on the list and use that for replace

|

Gerhard Rahn · Accepted Answer · 2023-02-02 17:28:05Z

0

If you are certain any and all rows follow the format provided, you could attempt the following by using a lambda function:

df['code_clean'] = df['code'].apply(lambda x: x.split('(')[0])

answered Feb 2, 2023 at 17:28

Gerhard Rahn

1

Comments

G S Praneeth Reddy · Accepted Answer · 2023-02-02 17:40:14Z

0

You can try the regular expression match method: https://docs.python.org/3/library/re.html#re.Pattern.match

df['code'] = df['code'].apply(lambda x: re.match(r'^(\w+)\(\w+\)',x).group(1))

The first part of the regular expression ^(\w+), creates a capturing group of any letters or numbers before encountering a parenthesis. The group(1) then extracts the text.

answered Feb 2, 2023 at 17:40

G S Praneeth Reddy

111 silver badge3 bronze badges

Comments

geekay · Accepted Answer · 2023-02-02 17:42:02Z

0

str.replace will work with one string not a list of strings.. you could probably loop through it

rmlist = ['(motor)', '(Eletro)', '(Defletor)', '(Eletro)', '( 56)']
for repl in rmlist:
    df['code'] = df['code'].str.replace(repl, '')

alternatively if your bracketed substring is at the end.. split it at "(" and discard additional column generated..will be faster for sure

df["code"]=df["code"].str.split(pat="(",n=1,expand=True)[0]

str.split is reasonably fast

edited Feb 2, 2023 at 17:42

answered Feb 2, 2023 at 17:22

geekay

4502 silver badges5 bronze badges

4 Comments

João Felipe Holanda Over a year ago

the dataframe is too big to loop, so I'm looking for a method or a function

geekay Over a year ago

str.replace is a vectorised implementation ..considerably faster than any other... alternatively why not just split at "(" whatever afte opening brace can be ignored

geekay Over a year ago

you can use apply with a lambda function..but that will be very heavy for a big dataframe

geekay Over a year ago

and Regex is atleast 10x slower than string replacement...regex not good for large dataframes. avoid Regex if possible with large dataframes if you need some speed.

Collectives™ on Stack Overflow

How can I replace substring from string by a list in a column dataframe?

4 Answers 4

10 Comments

Comments

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

10 Comments

Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related