2

How to compare a particular column value with rest of the same column values within the same dataframe?

e.g- let a dataframe is df.

df= A  B
    1  1
    2  0
    1  0
    1  1
    2  0

So we have to first take column A, then pick one by one value and compare rest of the A value. Like, I take 1 and compare with rest of the value like [2,1,1,2] and I found 3rd and 4th value is same. So the result should give me for 1 is =

A
false
true
true
false

Now pick 2 as it is second element. Output of it will be

A
false
false
false
true

basically compare each element with all other elements

This same process will go for column B,C,D....

Would anyone give me any solution how to do it?

1
  • What is expected output? Commented Sep 28, 2018 at 11:17

2 Answers 2

2

You can use list comprehension with compare all values without actual, which is removed by drop:

df1 = pd.concat([df.drop(i) == x for i, x in enumerate(df.values)], keys=df.index)
print (df1)
         A      B
0 1  False  False
  2   True  False
  3   True   True
  4  False  False
1 0  False  False
  2  False   True
  3  False  False
  4   True   True
2 0   True  False
  1  False   True
  3   True  False
  4  False   True
3 0   True   True
  1  False  False
  2   True  False
  4  False  False
4 0  False  False
  1   True   True
  2  False   True
  3  False  False

Detail:

In list comprehesnion create list of DataFrames:

print ([df.drop(i) == x for i, x in enumerate(df.values)])
[       A      B
1  False  False
2   True  False
3   True   True
4  False  False,        A      B
0  False  False
2  False   True
3  False  False
4   True   True,        A      B
0   True  False
1  False   True
3   True  False
4  False   True,        A      B
0   True   True
1  False  False
2   True  False
4  False  False,        A      B
0  False  False
1   True   True
2  False   True
3  False  False]

which are joined together by concat and parameter keys for MultiIndex if necessary, then is possible select each small DataFrame by loc:

print (df1.loc[0])
       A      B
1  False  False
2   True  False
3   True   True
4  False  False
Sign up to request clarification or add additional context in comments.

Comments

1
df_final = pd.DataFrame()

# Iterate all columns
for column in df.columns.tolist():
    # For the iterated column, iterate the line
    for line in range(len(df[column])):

        info = "column: " + str(column) + " - line: " + str(line)
        # Check if the cells below are equals to the iterated cell
        answer = df.loc[df.index > line,column] == df.loc[df.index == line,column].values[0]

        # Display the result
        print(info)
        print(answer)

        # Add the result in a dataframe
        for line in range(len(answer)):
            df_final = df_final.append([[
                info,
                answer.index[line],
                answer.values[line]
            ]])

# Display the resulting dataframe
df_final.columns = ["position", "index", "check"]
print(df_final)

1 Comment

does it fit to your needs ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.