Python Read fixed width files without any data type interpretation using Pandas

Question

I'm trying to set up a Python script that will be able to read in many fixed width data files and then convert them to csv. To do this I'm using pandas like this:

pandas.read_fwf('source.txt', colspecs=column_position_length).\
         to_csv('output.csv', header=column_name, index=False, encoding='utf-8')

Where column_position_length and column_name are lists containing the information needed to read and write the data.

Within these files I have long strings of numbers representing test answers. For instance: 333133322122222223133313222222221222111133313333 represents the correct answers on a multiple choice test. So this is more of a code than a numeric value. The problem that I am having is pandas interpreting these values as floats and then writing these values in scientific notation into the csv (3.331333221222221e+47).

I found a lot of questions regarding this issue, but they didn't quite resolve my issue.

Solution 1 - I believe at this point the values have already been converted to floats so this wouldn't help.
Solution 2 - according to the pandas documentation, dtype is not supported as an argument for read_fwf in Python.
Solution 3 Use converters - the issue with using converters is that you need to specify the column name or index to convert to a data type, but I would like to read all of the columns as strings.

The second option seemes to be the go to answer for reading every column in as a string, but unfortunately it just isn't supported for read_fwf. Any suggestions?

dtype is supported, and yes, setting it to object would be the optimal solution. — DYZ
– DYZ, Commented May 5, 2017 at 18:11
dtype : Type name or dict of column -> type, default None Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} (Unsupported with engine=’python’). Use str or object to preserve and not interpret dtype. pandas.pydata.org/pandas-docs/stable/generated/… — Razmodius
– Razmodius, Commented May 5, 2017 at 18:24

Razmodius · Accepted Answer · 2017-05-08 14:37:37Z

2

So I think I figured out a solution, but I don't know why it works. Pandas was interpreting these values as floats because there were NaN values (blank lines) in the columns. By adding keep_default_na=False to the read_fwf() parameters, it resolved this issue. According to the documentation:

keep_default_na : bool, default True If na_values are specified and keep_default_na is False the default NaN values are overridden, otherwise they’re appended to.

I guess I'm not quite understanding how this is fixing my issue. Could anyone add any clarity on this?

answered May 8, 2017 at 14:37

Razmodius

1414 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python Read fixed width files without any data type interpretation using Pandas

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related