0

I am trying to iterate through a dataframe that I have and use the values inside of the cells, but I need to use the names of the columns and rows that the cells come from. Because of that I am currently doing something like the following:

df=pandas.DataFrame(data={"C1" : [1,2,3,4,5], "C2":[1,2,3,4,5]}, 
                    index=["R1","R2","R3","R4","R5"])
for row in df.index.values:
    for column in df.columns.values:
       if (df[row][column] > 3:
           if row in df2[column]:
              print("data is present")

I need to use the row and column names because I am using them to look values up in another data frame that has related information. I know that for loops take forever in pandas, but I haven't been able to find any examples of how to iterate over both the row and the column and the same time. This:

df.applymap()

wont work because it only gives the value in the cell, without keeping reference to which row and column the cell was in, and this:

df.apply(lambda row: row["column"])

wont work because I need get the name of the column without knowing it before. Also this:

df.apply(lambda row: someFunction(row))

wont work because apply uses a Series object which only has the row name, rather than the row and column names.

Any insight would be helpful! I am currently running the for loop version but it takes forever and also hogs CPU cores.

1 Answer 1

1
import pandas as pd

df = pd.DataFrame(data={"C1": [1, 2, 3, 4, 5], 
                        "C2": [1, 2, 3, 4, 5]}, 
                  index=["R1", "R2", "R3", "R4", "R5"])
df2 = pd.DataFrame({'R3': [1], 'R5': [1], 'R6': [1]})

To get all of corresponding columns from df2 which have a value greater than 3 in df, you can use a conditional list comprehension:

>>> [idx for idx in df[df.gt(3).any(axis=1)].index if idx in df2]
['R5']

To see how this works:

>>> df.gt(3)
       C1     C2
R1  False  False
R2  False  False
R3  False  False
R4   True   True
R5   True   True

Then we want the index of any row that has a value greater than three:

df.gt(3).any(axis=1)
Out[23]: 
R1    False
R2    False
R3    False
R4     True
R5     True
dtype: bool

>>> df[df.gt(3).any(axis=1)]
    C1  C2
R4   4   4
R5   5   5

>>> [i for i in df[df.gt(3).any(axis=1)].index]
['R4', 'R5']

>>> [i for i in df[df.gt(3).any(axis=1)].index if i in df2]
['R5']
Sign up to request clarification or add additional context in comments.

1 Comment

This works if the data is accessible for entire rows--I need it on a cell by cell level. If I use this code as I understand it, I can't pair up the data cell wise. For example, if df['R4']['C1'] < 3 but df['R4']['C2'] > 3, the above code would tell me ["R4", "R5"]. However, I need to get the information R4 C2, and to know that R4 C1 is wrong

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.