Pandas python replace empty lines with string

Question

I have a csv which at some point becomes like this:

  57926,57927,"79961', 'dsfdfdf'",fdfdfdfd,0.40997048,5 x fdfdfdfd,
57927,57928,"fb0ec52878b165aa14ae302e6064aa636f9ca11aa11f5', 'fdfd'",fdfdfd,1.64948454,20 fdfdfdfd,"



                         US 



                "
57928,57929,"f55bf599dba600550de724a0bec11166b2c470f98aa06', 'fdfdf'",fdfdfd,0.81300813,10 fdfdfdfd,"



                         US 







                "
57929,57930,"82e6b', 'reetrtrt'",trtretrtr,0.79783365,fdfdfdf,"



                         NL

I want to get rid of this empty lines. So far I tried the following script :

df = pd.read_csv("scedon_etoimo.csv")

df = df.replace(r'\\n',' ', regex=True)

and

df=df.replace(r'\r\r\r\r\n\t\t\t\t\t\t', '',regex=True)

as this is the error I am getting. So far I haven't manage to clean my file and do the stuff I want to do. I am not sure if I am using the correct approach. I am using pandas to process my dataset. Any help?
"

the format looks invalid to me (different amount of cols perrow). is this the actual file? — Anton vBR
– Anton vBR, Commented Sep 26, 2018 at 19:33
@Gerasimos, you need to add an additional " to the end of your sample file. Right now the open quotes will break attempts to read your snippet. — Peter Maguire
– Peter Maguire, Commented Sep 26, 2018 at 19:48

rafaelc · Accepted Answer · 2018-09-26 19:36:36Z

2

I would first open and preprocess the file's data, and just then pass to pandas

lines = []
with open('file.csv') as f:
    for line in f:
        if line.strip(): lines.append(line.strip())

df = pd.read_csv(io.StringIO("\n".join(lines)))

edited Sep 26, 2018 at 19:36

answered Sep 26, 2018 at 19:31

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ctj232 Over a year ago

Another option is to read the file into a dataframe, and then post-process with the str.strip function using the dataframe apply or assignment as in the accepted answer for: stackoverflow.com/questions/40950310/…

Gerasimos Over a year ago

I also tried df.replace('\s+', '',regex=True,inplace=True) and worked. Anyway, thank you all for the answers!

Peter Maguire · Accepted Answer · 2018-09-26 19:46:49Z

0

Based on the file snippet you provided, here is how you can replace those empty lines Pandas is storing as NaNs with a blank string.

import numpy as np
df = pd.read_csv("scedon_etoimo.csv")
df = df.replace(np.nan, "", regex=True)

This will allow you to do everything on the base Pandas DataFrame without reading through your file(s) more than once. That being said, I would also recommend preprocessing your data before loading it in as that is often times a much safer way to handle data in non-uniform layouts.

answered Sep 26, 2018 at 19:46

Peter Maguire

4015 silver badges7 bronze badges

Comments

Valdi_Bo · Accepted Answer · 2018-09-26 19:52:43Z

0

Try:

df.replace(to_replace=r'[\n\r\t]', value='', regex=True, inplace=True)

This instruction replaces each \n, \r and Tab with nothing. Due to inplace argument, no need to substitute the result to df again.

Alternative: Use to_replace=r'\s' to eliminate also spaces, maybe in selected columns only.

answered Sep 26, 2018 at 19:52

Valdi_Bo

31.1k4 gold badges29 silver badges45 bronze badges

Collectives™ on Stack Overflow

Pandas python replace empty lines with string

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related