38

I have a simple 2 column csv file called st1.csv:

GRID    St1  
1457    614  
1458    657  
1459    679  
1460    732  
1461    754  
1462    811  
1463    748  

However, when I try to read the csv file, the first column is not loaded:

a = pandas.DataFrame.from_csv('st1.csv')  
a.columns

outputs:

 Index([u'ST1'], dtype=object)

Why is the first column not being read?

3
  • 14
    It's assuming that the first column is the index, try a = pandas.DataFrame.from_csv('st1.csv', index_col=False) Commented Feb 20, 2014 at 8:29
  • thank you so much, this is exactly what I was missing. Commented Feb 20, 2014 at 8:38
  • I am facing the exact opposite issue when I read a csv that was compressed (using python, pandas). any explanation for why it wasn't following behaviour? Commented Aug 28, 2020 at 6:05

3 Answers 3

59

Judging by your data it looks like the delimiter you're using is a .

Try the following:

a = pandas.DataFrame.from_csv('st1.csv', sep=' ')

The other issue is that it's assuming your first column is an index, which we can also disable:

a = pandas.DataFrame.from_csv('st1.csv', index_col=None)

UPDATE:

In newer pandas versions, do:

a = pandas.DataFrame.from_csv('st1.csv', index_col=False)
Sign up to request clarification or add additional context in comments.

5 Comments

interesting that in the docs there is no mention of setting index_col=False, but that's definitely part of the solution: pandas.pydata.org/pandas-docs/stable/generated/…
In Python 3: index_col=False throws an error, I used index_col=None and it works fine...
I agree with @Grant, you have to use index_col=None (even in Python 2).
@Grant & Tom - I have updated my answer to reflect this. Thank you for informing me.
Python 3.5 and pandas 0.21.1: index_col = False worked fine, but index_col = None was ignored. Strange.
12

For newer versions of pandas, pd.DataFrame.from_csv doesn't exist anymore, and index_col=None no longer does the trick with pd.read_csv. You'll want to use pd.read_csv with index_col=False instead:

pd.read_csv('st1.csv', index_col=False)

Example:

(so) URSA-MattM-MacBook:stackoverflow mmessersmith$ cat input.csv 
Date                        Employee        Operation        Order

2001-01-01 08:32:17         User1           Approved         #00045
2001-01-01 08:36:23         User1           Edited           #00045
2001-01-01 08:41:04         User1           Rejected         #00046
2001-01-01 08:42:56         User1           Deleted          #00046
2001-01-02 09:01:11         User1           Created          #00047
2019-10-03 17:23:45         User1           Approved         #72681

(so) URSA-MattM-MacBook:stackoverflow mmessersmith$ python
Python 3.7.4 (default, Aug 13 2019, 15:17:50) 
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.25.1'              
>>> df_bad_index = pd.read_csv('input.csv', delim_whitespace=True)
>>> df_bad_index
                Date Employee Operation   Order
2001-01-01  08:32:17    User1  Approved  #00045
2001-01-01  08:36:23    User1    Edited  #00045
2001-01-01  08:41:04    User1  Rejected  #00046
2001-01-01  08:42:56    User1   Deleted  #00046
2001-01-02  09:01:11    User1   Created  #00047
2019-10-03  17:23:45    User1  Approved  #72681
>>> df_bad_index.index
Index(['2001-01-01', '2001-01-01', '2001-01-01', '2001-01-01', '2001-01-02',
       '2019-10-03'],
      dtype='object')
>>> df_still_bad_index = pd.read_csv('input.csv', delim_whitespace=True, index_col=None)
>>> df_still_bad_index
                Date Employee Operation   Order
2001-01-01  08:32:17    User1  Approved  #00045
2001-01-01  08:36:23    User1    Edited  #00045
2001-01-01  08:41:04    User1  Rejected  #00046
2001-01-01  08:42:56    User1   Deleted  #00046
2001-01-02  09:01:11    User1   Created  #00047
2019-10-03  17:23:45    User1  Approved  #72681
>>> df_still_bad_index.index
Index(['2001-01-01', '2001-01-01', '2001-01-01', '2001-01-01', '2001-01-02',
       '2019-10-03'],
      dtype='object')
>>> df_good_index = pd.read_csv('input.csv', delim_whitespace=True, index_col=False)
>>> df_good_index
         Date  Employee Operation     Order
0  2001-01-01  08:32:17     User1  Approved
1  2001-01-01  08:36:23     User1    Edited
2  2001-01-01  08:41:04     User1  Rejected
3  2001-01-01  08:42:56     User1   Deleted
4  2001-01-02  09:01:11     User1   Created
5  2019-10-03  17:23:45     User1  Approved
>>> df_good_index.index
RangeIndex(start=0, stop=6, step=1)

Comments

6

Based on documentation which compares read_csv and from_csv, it shows that it is possible to put index_col = None. I tried the below and it worked:

DataFrame.from_csv('st1.csv', index_col=None);

This assumes that the data is comma-separated.

Please check the below link

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_csv.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.