reading file with missing values in python pandas

Question

I try to read .txt with missing values using pandas.read_csv. My data is of the format:

10/08/2012,12:10:10,name1,0.81,4.02,50;18.5701400N,4;07.7693770E,7.92,10.50,0.0106,4.30,0.0301
10/08/2012,12:10:11,name2,,,,,10.87,1.40,0.0099,9.70,0.0686

with thousands of samples with same name of the point, gps position, and other readings. I use a code:

myData = read_csv('~/data.txt', sep=',', na_values='')

The code is wrong as na_values does not gives NaN or other indicator. Columns should have the same size but I finish with different length.

I don't know what exactly should be typed in after na_values (did try all different things). Thanks

If you skiprows=1, then there is a single line in the file. Without that parameter I see clear NaNs in the DataFrame. — eumiro
– eumiro, Commented Sep 20, 2012 at 14:25
I posted only two lines of my data to show its format. skiprows=1 does not do anything with missing data, in an original file there is 15000 of lines and the first lines include some names, what I dont want. — tomasz74
– tomasz74, Commented Sep 20, 2012 at 15:32

Yevhen Kuzmovych · Accepted Answer · 2019-09-18 15:04:10Z

15

The parameter na_values must be "list like" (see this answer).

A string is "list like" so:

na_values='abc' # would transform the letters 'a', 'b' and 'c' each into `nan`
# is equivalent to
na_values=['a','b','c']

Similarly:

na_values=''
# is equivalent to
na_values=[] # and this is not what you want!

This means that you need to use na_values=[''].

edited Sep 18, 2019 at 15:04

Yevhen Kuzmovych

12.3k8 gold badges32 silver badges54 bronze badges

answered Sep 20, 2012 at 14:22

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

tomasz74 Over a year ago

Thank you for your answer. na_values=[''] was my first try but it does not gives desired effects. I have the same result if I take an argument as a list [''] or as a empty space ''. I really don't know what else to try as it seems it does not pick up missing values automatically and I have a problem to specify it

Andy Hayden Over a year ago

@tomasz74 It seems to work for me, with your example (without the skiprows)... perhaps you need to myData.T (transpose).

Andy Hayden Over a year ago

@tomasz74 After testing it seems that '', and with default (None) this just works for me fine (columns are the same size)...

tomasz74 Over a year ago

I went again through data after your reply. My confusion was, that in the output on each column name is a number of non-null values which is different for each column. But you are right the length is the same. Thanks a lot

Chang She · Accepted Answer · 2012-09-20 15:41:55Z

What version of pandas are you on? Interpreting empty string as NaN is the default behavior for pandas and seem to parse the empty strings fine in your data snippet both in v0.7.3 and current master without using the na_values parameter at all.

In [10]: data = """\
10/08/2012,12:10:10,name1,0.81,4.02,50;18.5701400N,4;07.7693770E,7.92,10.50,0.0106,4.30,0.0301
10/08/2012,12:10:11,name2,,,,,10.87,1.40,0.0099,9.70,0.0686
"""

In [11]: read_csv(StringIO(data), header=None).T
Out[11]: 
                   0           1
X.1       10/08/2012  10/08/2012
X.2         12:10:10    12:10:11
X.3            name1       name2
X.4             0.81         NaN
X.5             4.02         NaN
X.6   50;18.5701400N         NaN
X.7    4;07.7693770E         NaN
X.8             7.92       10.87
X.9             10.5         1.4
X.10          0.0106      0.0099
X.11             4.3         9.7
X.12          0.0301      0.0686

Collectives™ on Stack Overflow

reading file with missing values in python pandas

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related