Replace all occurrences of a string in a pandas dataframe (Python)

Question

I have a pandas dataframe with about 20 columns.

It is possible to replace all occurrences of a string (here a newline) by manually writing all column names:

df['columnname1'] = df['columnname1'].str.replace("\n","<br>")
df['columnname2'] = df['columnname2'].str.replace("\n","<br>")
df['columnname3'] = df['columnname3'].str.replace("\n","<br>")
...
df['columnname20'] = df['columnname20'].str.replace("\n","<br>")

This unfortunately does not work:

df = df.replace("\n","<br>")

Is there any other, more elegant solution?

Alex Riley · Accepted Answer · 2022-06-05 13:11:52Z

120

You can use replace and pass the strings to find/replace as dictionary keys/items:

df.replace({'\n': '<br>'}, regex=True)

For example:

>>> df = pd.DataFrame({'a': ['1\n', '2\n', '3'], 'b': ['4\n', '5', '6\n']})
>>> df
   a    b
0  1\n  4\n
1  2\n  5
2  3    6\n

>>> df.replace({'\n': '<br>'}, regex=True)
   a      b
0  1<br>  4<br>
1  2<br>  5
2  3      6<br>

Note that this method returns a new DataFrame instance by default (it does not modify the original), so you'll need to either reassign the output:

df = df.replace({'\n': '<br>'}, regex=True)

or specify inplace=True:

df.replace({'\n': '<br>'}, regex=True, inplace=True)

edited Jun 5, 2022 at 13:11

answered Sep 6, 2014 at 9:21

Alex Riley

178k46 gold badges274 silver badges247 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Yichuan Wang Over a year ago

This doesn't work for me! Pandas version '0.15.1', python 2.7.9, Ubuntu 14.04.

Yichuan Wang Over a year ago

Python 2.7.9 |Anaconda 2.1.0 (64-bit)| (default, Mar  9 2015, 16:20:48) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://binstar.org >>> import pandas as pd >>> df = pd.DataFrame({'a': ['1\n', '2\n', '3'], 'b': ['4\n', '5', '6\n']}) >>> df      a    b 0  1\n  4\n 1  2\n    5 2    3  6\n >>> df.replace({'\n': '<br>'})      a    b 0  1\n  4\n 1  2\n    5 2    3  6\n >>>

Nicholas Morley Over a year ago

Use backslash to match a character literally, and {n} to quantify. Thus: df.replace('\.{3}', 'stuff', regex=True)

Shane S Over a year ago

This method does not work. What does work is Mykola Zotko method. for col in df.columns: df[col] = df[col].str.replace('\n', '<br>')

Alex Riley Over a year ago

@ShaneS: it still works fine for me (Python 3.10, pandas 1.4.2). The only difference with the method you've highlighted is that df.replace({'\n': '<br>'}, regex=True) returns a new DataFrame object instead of updating the columns on the original DataFrame. So you'll need to reassign the output, e.g. df = df.replace({'\n': '<br>'}, regex=True).

|

Yichuan Wang · Accepted Answer · 2015-04-06 04:10:35Z

23

It seems Pandas has change its API to avoid ambiguity when handling regex. Now you should use:

df.replace({'\n': '<br>'}, regex=True)

For example:

>>> df = pd.DataFrame({'a': ['1\n', '2\n', '3'], 'b': ['4\n', '5', '6\n']})
>>> df
   a    b
0  1\n  4\n
1  2\n  5
2  3    6\n

>>> df.replace({'\n': '<br>'}, regex=True)
   a      b
0  1<br>  4<br>
1  2<br>  5
2  3      6<br>

answered Apr 6, 2015 at 4:10

Yichuan Wang

7538 silver badges16 bronze badges

2 Comments

Andrei Sura Over a year ago

You can also use the "inplace=True" to avoid creating a copy -- pandas.pydata.org/pandas-docs/stable/generated/…

Vega Over a year ago

Doc says nothing about not creating a copy. Afaik there are only 2 functions that do not create a copy in pandas = inplace does not save anything.

Mykola Zotko · Accepted Answer · 2021-08-03 20:14:20Z

6

You can iterate over all columns and use the method str.replace:

for col in df.columns:
   df[col] = df[col].str.replace('\n', '<br>')

This method uses regex by default.

answered Aug 3, 2021 at 20:14

Mykola Zotko

18.2k6 gold badges88 silver badges90 bronze badges

Comments

Jasper Kinoti · Accepted Answer · 2016-09-01 09:48:38Z

-1

This will remove all newlines and unecessary spaces. You can edit the ' '.join to specify a replacement character

    df['columnname'] = [''.join(c.split()) for c in df['columnname'].astype(str)]

answered Sep 1, 2016 at 9:48

Jasper Kinoti

4993 silver badges9 bronze badges

Collectives™ on Stack Overflow

Replace all occurrences of a string in a pandas dataframe (Python)

4 Answers 4

9 Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

9 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related