9

I am trying to process some .csv data using pandas, and I am struggling with something that I am sure is a rookie move, but after spending a lot of time trying to make this work, I need your help.

Essentially, I am trying to find the index of a value within a dataframe I have created.

max = cd_gross_revenue.max()
#max value of the cd_gross_revenue dataframe

print max
#finds max value, no problem!

maxindex = cd_gross_revenue.idxmax()
print maxindex
#finds index of max_value, what I wanted!

print max.index
#ERROR: AttributeError: 'numpy.float64' object has no attribute 'index'

The maxindex variable gets me the answer using idxmax(), but what if I am not looking for the index of a max value? What if it is some random value's index that I am looking at, how would I go about it? Clearly .index does not work for me here.

Thanks in advance for any help!

3
  • 1
    Does this dataframe have only 1 column or do you know which column has the max value? if you know the column then df.loc[df.col == max].index would return you the index Commented Oct 1, 2014 at 20:45
  • Hi EdChum, thanks for your answer. Doing this gives me the following error Traceback (most recent call last): File "psims2.py", line 81, in <module> print cd_gross_revenue.loc[cd_gross_revenue.col == max].index File "C:\Python27\lib\site-packages\pandas-0.14.1-py2.7-win32.egg\pandas\core\generic.py", line 18 43, in __getattr__ (type(self).__name__, name)) AttributeError: 'Series' object has no attribute 'col' Commented Oct 1, 2014 at 20:50
  • I think you misunderstand, col was a generic name for your column of interest so substitute the column name with the one from your df, my question is how many columns does this df have and is there only 1 or do you know which column has the max value, if so the subsitute col with that name Commented Oct 1, 2014 at 20:55

3 Answers 3

4

Use a boolean mask to get the rows where the value is equal to the random variable. Then use that mask to index the dataframe or series. Then you would use the .index field of the pandas dataframe or series. An example is:

In [9]: s = pd.Series(range(10,20))

In [10]: s
Out[10]:

0    10
1    11
2    12
3    13
4    14
5    15
6    16
7    17
8    18
9    19
dtype: int64

In [11]: val_mask = s == 13

In [12]: val_mask

Out[12]:
0    False
1    False
2    False
3     True
4    False
5    False
6    False
7    False
8    False
9    False
dtype: bool

In [15]: s[val_mask]
Out[15]:
3    13
dtype: int64

In [16]: s[val_mask].index
Out[16]: Int64Index([3], dtype='int64')
Sign up to request clarification or add additional context in comments.

Comments

4

s[s==13]

Eg,

from pandas import Series

s = Series(range(10,20))
s[s==13]

3    13
dtype: int64

Comments

1

When you called idxmax it returned the key in the index which corresponded to the max value. You need to pass that key to the dataframe to get that value.

max_key = cd_gross_revenue.idxmax()
max_value = cd_gross_revenue.loc[max_key]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.