Replacing an string in a dataframe python

Question

I have a (7,11000) dataframe. in some of these 7 columns, there are strings. In Coulmn 2 and row 1000, there is a string 'London'. I want to change it to 'Paris'. how can I do this? I searched all over the web but I couldnt find a way. I used theses commands but none of them works:

df['column2'].replace('London','Paris')
df['column2'].str.replace('London','Paris')
re.sub('London','Paris',df['column2'])

I usually receive this error:

TypeError: expected string or bytes-like object

Please add the outputs of df.info() to this question. Secondly, when describing the size of a dataframe, generally the pattern is (rows, columns). So, your dataframe I think is (11000, 7). df['column2'] = df['column2'].replace(to_replace='London', value='Paris') should work. — Scott Boston
– Scott Boston, Commented Feb 7, 2019 at 2:20

Charles · Accepted Answer · 2019-02-07 03:48:16Z

3

If you want to replace a single row (you mention row 1000), you can do it with .loc. If you want to replace all occurrences of 'London', you could do this:

import pandas as pd
df = pd.DataFrame({'country': ['New York', 'London'],})
df.country = df.country.str.replace('London', 'Paris')

Alternatively, you could write your own replacement function, and then use .apply:

def replace_country(string):
    if string == 'London':
        return 'Paris'
    return string

df.country = df.country.apply(replace_country)

The second method is a bit overkill, but is a good example that generalizes better for more complex tasks.

edited Feb 7, 2019 at 3:48

answered Feb 7, 2019 at 2:21

Charles

3,3143 gold badges15 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

CFD Over a year ago

I didnt use RETURN :(((

Jose Angel Sanchez · Accepted Answer · 2019-02-07 02:22:30Z

0

Before replacing check for non characters with re

import re
for r, map in re_map.items():
    df['column2'] = [re.sub(r, map, x) for x in df['column2']]

answered Feb 7, 2019 at 2:22

Jose Angel Sanchez

76410 silver badges21 bronze badges

1 Comment

Mohit Motwani Over a year ago

Avoid for loops on dataframes unless absolutely necessay. These operations are very slow. Use the inbuilt functions instead.

Brandon Bertelsen · Accepted Answer · 2019-02-07 03:49:39Z

0

These are all great answers but many are not vectorized, operating on every item in the series once rather than working on the entire series.

A very reliable filter + replace strategy is to create a mask or subset True/False series and then use loc with that series to replace:

mask = df.country == 'London' 
df.loc[mask, 'country'] = 'Paris'

# On 10m records:
  # this method < 1 second 
  # @Charles method 1 < 10 seconds
  # @Charles method 2 < 3.5 seconds
  # @jose method didn't bother because it would be 30 seconds or more

answered Feb 7, 2019 at 3:49

Brandon Bertelsen

44.8k37 gold badges170 silver badges261 bronze badges

Collectives™ on Stack Overflow

Replacing an string in a dataframe python

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related