Iterating over columns and rows in pandas dataframe

Question

I am trying to iterate through a dataframe that I have and use the values inside of the cells, but I need to use the names of the columns and rows that the cells come from. Because of that I am currently doing something like the following:

df=pandas.DataFrame(data={"C1" : [1,2,3,4,5], "C2":[1,2,3,4,5]}, 
                    index=["R1","R2","R3","R4","R5"])
for row in df.index.values:
    for column in df.columns.values:
       if (df[row][column] > 3:
           if row in df2[column]:
              print("data is present")

I need to use the row and column names because I am using them to look values up in another data frame that has related information. I know that for loops take forever in pandas, but I haven't been able to find any examples of how to iterate over both the row and the column and the same time. This:

df.applymap()

wont work because it only gives the value in the cell, without keeping reference to which row and column the cell was in, and this:

df.apply(lambda row: row["column"])

wont work because I need get the name of the column without knowing it before. Also this:

df.apply(lambda row: someFunction(row))

wont work because apply uses a Series object which only has the row name, rather than the row and column names.

Any insight would be helpful! I am currently running the for loop version but it takes forever and also hogs CPU cores.

Alexander · Accepted Answer · 2016-02-06 05:12:03Z

1

import pandas as pd

df = pd.DataFrame(data={"C1": [1, 2, 3, 4, 5], 
                        "C2": [1, 2, 3, 4, 5]}, 
                  index=["R1", "R2", "R3", "R4", "R5"])
df2 = pd.DataFrame({'R3': [1], 'R5': [1], 'R6': [1]})

To get all of corresponding columns from df2 which have a value greater than 3 in df, you can use a conditional list comprehension:

>>> [idx for idx in df[df.gt(3).any(axis=1)].index if idx in df2]
['R5']

To see how this works:

>>> df.gt(3)
       C1     C2
R1  False  False
R2  False  False
R3  False  False
R4   True   True
R5   True   True

Then we want the index of any row that has a value greater than three:

df.gt(3).any(axis=1)
Out[23]: 
R1    False
R2    False
R3    False
R4     True
R5     True
dtype: bool

>>> df[df.gt(3).any(axis=1)]
    C1  C2
R4   4   4
R5   5   5

>>> [i for i in df[df.gt(3).any(axis=1)].index]
['R4', 'R5']

>>> [i for i in df[df.gt(3).any(axis=1)].index if i in df2]
['R5']

edited Feb 6, 2016 at 5:12

answered Feb 6, 2016 at 5:05

Alexander

111k32 gold badges212 silver badges208 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Bill Greenwald Over a year ago

This works if the data is accessible for entire rows--I need it on a cell by cell level. If I use this code as I understand it, I can't pair up the data cell wise. For example, if df['R4']['C1'] < 3 but df['R4']['C2'] > 3, the above code would tell me ["R4", "R5"]. However, I need to get the information R4 C2, and to know that R4 C1 is wrong

Collectives™ on Stack Overflow

Iterating over columns and rows in pandas dataframe

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related