read csv files pandas slice arrays

Question

I'm a python newbie and am having trouble reading a csv to pandas and working with it. Here is a bit of my csv file:

And my python code:

import numpy as np
import pandas as pd 
from math import exp
from math import sqrt

g = pd.DataFrame.from_csv('test.csv')

a = g.iloc[2:4,1]
print(a)

I get the following error:

IndexError: index 1 is out of bounds for axis 0 with size 1

I've also tried:

a = g.iloc[2:4,'B']

and many other permutations for defining columns and rows.

Also when I print g, I get the following:

             B
A             
2015-05-01  56
2015-05-02  76
2015-05-03  23
2015-05-04  45
2015-05-05  54
2015-05-06  65
2015-05-07  22

I can't understand why A and B are not aligned.

I'm just using this an example, but in general I'd like to read in large csv files and then perform operations on certain aspects of the matrix.

Any help would be appreciated.

EdChum · Accepted Answer · 2017-04-26 09:27:22Z

3

Firstly DataFrame.from_csv whilst still supported, it's better to use the top level read_csv instead as this supports more functionality.

So this:

a = g.iloc[2:4,1]

is wrong syntax, you want:

a = g.iloc[2:4]['A']

Secondly, by default DataFrame.from_csv uses the first column as the index which is why column 'A' is your index, if you passed index_col=None then you get the desired result:

In [6]:
pd.DataFrame.from_csv(file_path)

Out[6]:
    B
A    
1  56
2  76
3  23
4  45
5  54
6  65
7  22
In [7]:    
pd.DataFrame.from_csv(file_path, index_col=None)

Out[7]:
   A   B
0  1  56
1  2  76
2  3  23
3  4  45
4  5  54
5  6  65
6  7  22

Correct syntax:

In [9]:   
df.iloc[2:4]['A']

Out[9]:
2    3
3    4
Name: A, dtype: int64

Additionally read_csv the default for index_col is None so your problem with the alignment would not have happened if you had used read_csv.

Please check the docs on indexing and selecting.

EDIT

As @Jeff suggested and I always agree with Jeff, for this kind of selection ix is the typical selection method but it's behaviour differs from iloc in that it does include the end row selection unlike iloc:

In [10]:    
df.ix[2:4,'A']

Out[10]:
2    3
3    4
4    5
Name: A, dtype: int64

So I don't know what you wanted row selection-wise but be aware of the different semantics.

Update

Note that .ix will be deprecated in the future, you can achieve the same result using .loc:

In [202]:
df.loc[2:4,'A']

Out[202]:
2    3
3    4
4    5
Name: A, dtype: int64

edited Apr 26, 2017 at 9:27

answered May 30, 2015 at 22:32

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jeff Over a year ago

this is a use case for ix

EdChum Over a year ago

@Jeff I agree but I wasn't sure what the OP's desired behaviour was for the row selection as .ix row selection will include the end index row which is different to iloc

Collectives™ on Stack Overflow

read csv files pandas slice arrays

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related