3

I have a (7,11000) dataframe. in some of these 7 columns, there are strings. In Coulmn 2 and row 1000, there is a string 'London'. I want to change it to 'Paris'. how can I do this? I searched all over the web but I couldnt find a way. I used theses commands but none of them works:

df['column2'].replace('London','Paris')
df['column2'].str.replace('London','Paris')
re.sub('London','Paris',df['column2'])

I usually receive this error:

TypeError: expected string or bytes-like object
1
  • Please add the outputs of df.info() to this question. Secondly, when describing the size of a dataframe, generally the pattern is (rows, columns). So, your dataframe I think is (11000, 7). df['column2'] = df['column2'].replace(to_replace='London', value='Paris') should work. Commented Feb 7, 2019 at 2:20

3 Answers 3

3

If you want to replace a single row (you mention row 1000), you can do it with .loc. If you want to replace all occurrences of 'London', you could do this:

import pandas as pd
df = pd.DataFrame({'country': ['New York', 'London'],})
df.country = df.country.str.replace('London', 'Paris')

Alternatively, you could write your own replacement function, and then use .apply:

def replace_country(string):
    if string == 'London':
        return 'Paris'
    return string

df.country = df.country.apply(replace_country)

The second method is a bit overkill, but is a good example that generalizes better for more complex tasks.

Sign up to request clarification or add additional context in comments.

1 Comment

I didnt use RETURN :(((
0

Before replacing check for non characters with re

import re
for r, map in re_map.items():
    df['column2'] = [re.sub(r, map, x) for x in df['column2']]

1 Comment

Avoid for loops on dataframes unless absolutely necessay. These operations are very slow. Use the inbuilt functions instead.
0

These are all great answers but many are not vectorized, operating on every item in the series once rather than working on the entire series.

A very reliable filter + replace strategy is to create a mask or subset True/False series and then use loc with that series to replace:

mask = df.country == 'London' 
df.loc[mask, 'country'] = 'Paris'

# On 10m records:
  # this method < 1 second 
  # @Charles method 1 < 10 seconds
  # @Charles method 2 < 3.5 seconds
  # @jose method didn't bother because it would be 30 seconds or more

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.