Reading a text file using Pandas where some rows have empty elements?

Question

I have a dataset in a textfile that looks like this.

    0    0CF00400 X       8  66  7D  91  6E  22  03  0F  7D       0.021650 R
    0    18EA0080 X       3  E9  FE  00                           0.022550 R
    0    00000003 X       8  D5  64  22  E1  FF  FF  FF  F0       0.023120 R

I read this using

file_pandas = pd.read_csv(fileName, delim_whitespace = True, header = None, engine = 'python')

And got the output

    0   0  0CF00400  X   8  66  7D  91        6E  22    03    0F    7D  0.02165   
    1   0  18EA0080  X   3  E9  FE   0  0.022550   R  None  None  None      NaN   
    2   0  00000003  X   8  D5  64  22        E1  FF    FF    FF    F0  0.02312

But I want this read as

    0   0  0CF00400  X   8  66  7D  91        6E  22    03    0F    7D  0.021650   R  
    1   0  18EA0080  X   3  E9  FE  00                                  0.022550   R
    2   0  00000003  X   8  D5  64  22        E1  FF    FF    FF    F0  0.023120   R

I've tried removing delim_whitespace = True and replacing it with delimiter = " " but that just combined the first four columns in the output shown above, but it did parse the rest of the data correctly, meaning that the rest of the columns were like the origin txt file (barring the NaN values in whitespaces).

I'm not sure how to proceed from here.

Side note: the 00 is being parsed as only 0. Is there a way to display 00 instead?

this looks like a fixed width file, can you try read_fwf also do you have tabs or spaces here? to preserve the 00 you'll need to pass dtype=np.object — EdChum
– EdChum, Commented Oct 19, 2016 at 15:25

akuiper · Accepted Answer · 2016-10-19 15:26:28Z

8

It seems like your data is fixed width columns, you can try pandas.read_fwf():

from io import StringIO
import pandas as pd

df = pd.read_fwf(StringIO("""0    0CF00400 X       8  66  7D  91  6E  22  03  0F  7D       0.021650 R
0    18EA0080 X       3  E9  FE  00                           0.022550 R
0    00000003 X       8  D5  64  22  E1  FF  FF  FF  F0       0.023120 R"""), 
                 header = None, widths = [1,12,2,8,4,4,4,4,4,4,4,4,16,2])

answered Oct 19, 2016 at 15:26

akuiper

216k33 gold badges362 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Aditya Salapaka Over a year ago

I tried your method but the values in columns 4-11 were all separated with a whitespace. Like 6 6, 6 7, etc. However, using just read_fwf() without the widths argument worked really well! I just have the issue of the 00 showing up as 0. I tried dtype = np.object but dtype isn't supported with the python engine. Any suggestions?

akuiper Over a year ago

Use converters = {6:str} argument to avoid the column to be converted to int, try this df = pd.read_fwf(file_name, header = None, converters = {6:str})

Aditya Salapaka Over a year ago

Was able to try that argument out only today, it worked! Thanks!

Collectives™ on Stack Overflow

Reading a text file using Pandas where some rows have empty elements?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related