5

I have a dataset in a textfile that looks like this.

    0    0CF00400 X       8  66  7D  91  6E  22  03  0F  7D       0.021650 R
    0    18EA0080 X       3  E9  FE  00                           0.022550 R
    0    00000003 X       8  D5  64  22  E1  FF  FF  FF  F0       0.023120 R

I read this using

file_pandas = pd.read_csv(fileName, delim_whitespace = True, header = None, engine = 'python')

And got the output

    0   0  0CF00400  X   8  66  7D  91        6E  22    03    0F    7D  0.02165   
    1   0  18EA0080  X   3  E9  FE   0  0.022550   R  None  None  None      NaN   
    2   0  00000003  X   8  D5  64  22        E1  FF    FF    FF    F0  0.02312   

But I want this read as

    0   0  0CF00400  X   8  66  7D  91        6E  22    03    0F    7D  0.021650   R  
    1   0  18EA0080  X   3  E9  FE  00                                  0.022550   R
    2   0  00000003  X   8  D5  64  22        E1  FF    FF    FF    F0  0.023120   R

I've tried removing delim_whitespace = True and replacing it with delimiter = " " but that just combined the first four columns in the output shown above, but it did parse the rest of the data correctly, meaning that the rest of the columns were like the origin txt file (barring the NaN values in whitespaces).

I'm not sure how to proceed from here.

Side note: the 00 is being parsed as only 0. Is there a way to display 00 instead?

1
  • 1
    this looks like a fixed width file, can you try read_fwf also do you have tabs or spaces here? to preserve the 00 you'll need to pass dtype=np.object Commented Oct 19, 2016 at 15:25

1 Answer 1

8

It seems like your data is fixed width columns, you can try pandas.read_fwf():

from io import StringIO
import pandas as pd

df = pd.read_fwf(StringIO("""0    0CF00400 X       8  66  7D  91  6E  22  03  0F  7D       0.021650 R
0    18EA0080 X       3  E9  FE  00                           0.022550 R
0    00000003 X       8  D5  64  22  E1  FF  FF  FF  F0       0.023120 R"""), 
                 header = None, widths = [1,12,2,8,4,4,4,4,4,4,4,4,16,2])

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

I tried your method but the values in columns 4-11 were all separated with a whitespace. Like 6 6, 6 7, etc. However, using just read_fwf() without the widths argument worked really well! I just have the issue of the 00 showing up as 0. I tried dtype = np.object but dtype isn't supported with the python engine. Any suggestions?
Use converters = {6:str} argument to avoid the column to be converted to int, try this df = pd.read_fwf(file_name, header = None, converters = {6:str})
Was able to try that argument out only today, it worked! Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.