74

I have a pandas dataframe with about 20 columns.

It is possible to replace all occurrences of a string (here a newline) by manually writing all column names:

df['columnname1'] = df['columnname1'].str.replace("\n","<br>")
df['columnname2'] = df['columnname2'].str.replace("\n","<br>")
df['columnname3'] = df['columnname3'].str.replace("\n","<br>")
...
df['columnname20'] = df['columnname20'].str.replace("\n","<br>")

This unfortunately does not work:

df = df.replace("\n","<br>")

Is there any other, more elegant solution?

4 Answers 4

120

You can use replace and pass the strings to find/replace as dictionary keys/items:

df.replace({'\n': '<br>'}, regex=True)

For example:

>>> df = pd.DataFrame({'a': ['1\n', '2\n', '3'], 'b': ['4\n', '5', '6\n']})
>>> df
   a    b
0  1\n  4\n
1  2\n  5
2  3    6\n

>>> df.replace({'\n': '<br>'}, regex=True)
   a      b
0  1<br>  4<br>
1  2<br>  5
2  3      6<br>

Note that this method returns a new DataFrame instance by default (it does not modify the original), so you'll need to either reassign the output:

df = df.replace({'\n': '<br>'}, regex=True)

or specify inplace=True:

df.replace({'\n': '<br>'}, regex=True, inplace=True)
Sign up to request clarification or add additional context in comments.

9 Comments

This doesn't work for me! Pandas version '0.15.1', python 2.7.9, Ubuntu 14.04.
Python 2.7.9 |Anaconda 2.1.0 (64-bit)| (default, Mar 9 2015, 16:20:48) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://binstar.org >>> import pandas as pd >>> df = pd.DataFrame({'a': ['1\n', '2\n', '3'], 'b': ['4\n', '5', '6\n']}) >>> df a b 0 1\n 4\n 1 2\n 5 2 3 6\n >>> df.replace({'\n': '<br>'}) a b 0 1\n 4\n 1 2\n 5 2 3 6\n >>>
Use backslash to match a character literally, and {n} to quantify. Thus: df.replace('\.{3}', 'stuff', regex=True)
This method does not work. What does work is Mykola Zotko method. for col in df.columns: df[col] = df[col].str.replace('\n', '<br>')
@ShaneS: it still works fine for me (Python 3.10, pandas 1.4.2). The only difference with the method you've highlighted is that df.replace({'\n': '<br>'}, regex=True) returns a new DataFrame object instead of updating the columns on the original DataFrame. So you'll need to reassign the output, e.g. df = df.replace({'\n': '<br>'}, regex=True).
|
23

It seems Pandas has change its API to avoid ambiguity when handling regex. Now you should use:

df.replace({'\n': '<br>'}, regex=True)

For example:

>>> df = pd.DataFrame({'a': ['1\n', '2\n', '3'], 'b': ['4\n', '5', '6\n']})
>>> df
   a    b
0  1\n  4\n
1  2\n  5
2  3    6\n

>>> df.replace({'\n': '<br>'}, regex=True)
   a      b
0  1<br>  4<br>
1  2<br>  5
2  3      6<br>

2 Comments

You can also use the "inplace=True" to avoid creating a copy -- pandas.pydata.org/pandas-docs/stable/generated/…
Doc says nothing about not creating a copy. Afaik there are only 2 functions that do not create a copy in pandas = inplace does not save anything.
6

You can iterate over all columns and use the method str.replace:

for col in df.columns:
   df[col] = df[col].str.replace('\n', '<br>')

This method uses regex by default.

Comments

-1

This will remove all newlines and unecessary spaces. You can edit the ' '.join to specify a replacement character

    df['columnname'] = [''.join(c.split()) for c in df['columnname'].astype(str)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.