Trouble reading CSV data into Pandas dataframe (Python/Pandas)

Question

I'm having some trouble reading some csv data into a pandas data frame. Here's what my data looks like:

C1,            C2,              C3,              C4,            C5,  
5.0010254,     12,            0.37,          1.2672,        2039.5,
5.0499756,     12,            0.37,          1.2672,        2039.5,
5.1000244,     12,            0.37,          1.2672,        2039.5,
5.1500122,     12,            0.37,          1.2672,        2039.5,
5.2,           12,            0.37,          1.2672,        2039.5,
5.2499878,     12,            0.37,          1.2672,        2039.5,
5.2999756,     12,            0.37,          1.2672,        2039.5,
5.3500244,     12,            0.37,          1.2672,        2039.5,
5.4000122,     12,            0.37,          1.2672,        2039.5,
5.45,          12,            0.37,          1.2672,        2039.5,
5.4999878,     12,            0.37,          1.2672,        2039.5,

As you can see, the data is comma delimited, but also has a lot of spaces in it after the comma's. I do not know if this is what is causing me trouble, but if I say:

infl = pd.read_csv('filename.txt', sep=",", header=1, na_values=["-999"])
print infl['C2']

I get the error:

KeyError: 'C2'

I have tried the read_csv command with and without explicitly defining the delimiter without success. Any help is appreciated!

Did you try specifying a regex delimiter like ', +'? You could also look at read_fwf if your file is in fixed-width format (each column of data has a fixed width in characters). — BrenBarn
– BrenBarn, Commented May 5, 2015 at 17:19
Could you try: infl = pd.read_csv('filename.txt', sep=",\s+", header=1, na_values=["-999"]) this will leave you with a trailing comma for your last column which you can remove later — EdChum
– EdChum, Commented May 5, 2015 at 17:23
Both your question's version of read_csv as well as vanilla read_csv worked fine for me on your input. Are you using an old pandas, perhaps? — Ami Tavory
– Ami Tavory, Commented May 5, 2015 at 17:25

joris · Accepted Answer · 2015-05-05 17:31:31Z

5

One solution is to pass the skipinitialspace argument, to specify that all whitespace after the delimiter should be ignored:

pd.read_csv('filename.txt', sep=",", header=1, na_values=["-999"], skipinitialspace=True)

See the docstring of read_csv for all possible arguments: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

answered May 5, 2015 at 17:31

joris

140k37 gold badges257 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Trouble reading CSV data into Pandas dataframe (Python/Pandas)

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related