0

I'm a python newbie and am having trouble reading a csv to pandas and working with it. Here is a bit of my csv file:

A   B
1   56
2   76
3   23
4   45
5   54
6   65
7   22

And my python code:

import numpy as np
import pandas as pd 
from math import exp
from math import sqrt

g = pd.DataFrame.from_csv('test.csv')

a = g.iloc[2:4,1]
print(a)

I get the following error:

IndexError: index 1 is out of bounds for axis 0 with size 1

I've also tried:

a = g.iloc[2:4,'B']

and many other permutations for defining columns and rows.

Also when I print g, I get the following:

             B
A             
2015-05-01  56
2015-05-02  76
2015-05-03  23
2015-05-04  45
2015-05-05  54
2015-05-06  65
2015-05-07  22

I can't understand why A and B are not aligned.

I'm just using this an example, but in general I'd like to read in large csv files and then perform operations on certain aspects of the matrix.

Any help would be appreciated.

1 Answer 1

3

Firstly DataFrame.from_csv whilst still supported, it's better to use the top level read_csv instead as this supports more functionality.

So this:

a = g.iloc[2:4,1]

is wrong syntax, you want:

a = g.iloc[2:4]['A']

Secondly, by default DataFrame.from_csv uses the first column as the index which is why column 'A' is your index, if you passed index_col=None then you get the desired result:

In [6]:
pd.DataFrame.from_csv(file_path)

Out[6]:
    B
A    
1  56
2  76
3  23
4  45
5  54
6  65
7  22
In [7]:    
pd.DataFrame.from_csv(file_path, index_col=None)

Out[7]:
   A   B
0  1  56
1  2  76
2  3  23
3  4  45
4  5  54
5  6  65
6  7  22

Correct syntax:

In [9]:   
df.iloc[2:4]['A']

Out[9]:
2    3
3    4
Name: A, dtype: int64

Additionally read_csv the default for index_col is None so your problem with the alignment would not have happened if you had used read_csv.

Please check the docs on indexing and selecting.

EDIT

As @Jeff suggested and I always agree with Jeff, for this kind of selection ix is the typical selection method but it's behaviour differs from iloc in that it does include the end row selection unlike iloc:

In [10]:    
df.ix[2:4,'A']

Out[10]:
2    3
3    4
4    5
Name: A, dtype: int64

So I don't know what you wanted row selection-wise but be aware of the different semantics.

Update

Note that .ix will be deprecated in the future, you can achieve the same result using .loc:

In [202]:
df.loc[2:4,'A']

Out[202]:
2    3
3    4
4    5
Name: A, dtype: int64
Sign up to request clarification or add additional context in comments.

2 Comments

this is a use case for ix
@Jeff I agree but I wasn't sure what the OP's desired behaviour was for the row selection as .ix row selection will include the end index row which is different to iloc

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.