2

I have a DataFrame A as follows, and I want to find the rows with the same values in their first 3 columns.

import pandas as pd
import io
import numpy as np
import datetime
A= """
   c0   c1   c2   c3   c4
0  1    a    d    3    4
1  1    a    c    0    0
2  1    a    d    3    1
3  1    b    c    0    0
4  2    b    d    8    5
5  2    b    d    3    3
    """

df = pd.read_csv(io.StringIO(A), delimiter='\s+')
df2= pd.DataFrame(df, columns=['c0', 'c1', 'c2'])
for i in range(0,4):
    row1 = df2.irow(i)
    row2 = df2.irow(i+1)
    val=all(unique_columns = row1 != row2)   
    print(i)

I want it to print 2, 5.

Well, this does not work, even if it would it couldn't get the rows that are following eachother.

Alternatively, I tried np.unique(df2), to see if the number of columns are different from df2, which didn't work either.

Any help is appreciated.

2
  • 1
    ...but only the row 2 has the same values in c0-c2 as the row 0, row 6 does not. Commented Nov 9, 2015 at 16:59
  • @CTZhu, yes, but row 5 has the same value as row 4. Commented Nov 9, 2015 at 17:00

2 Answers 2

4

IIUC then use duplicated:

In [132]:
df2.index[df2.duplicated()]

Out[132]:
Int64Index([2, 6], dtype='int64')

So this works because it detects when any row has duplicate values, as df2 is a subset of the cols of interest then all columns are tested.

EDIT

df2 seems superfluous here you can just do:

In [133]:
df.index[df.duplicated(subset=['c0', 'c1', 'c2'])]

Out[133]:
Int64Index([2, 6], dtype='int64')
Sign up to request clarification or add additional context in comments.

2 Comments

Possibly include subset since only first 3 columns are needed.
You're right, the OP might need to consider removing df2 to prevent unnecessary steps and possibly doubling the data
1
In [211]: a.groupby(['c0','c1','c2']).indices
Out[211]:
{(1, 'a', 'c'): array([1]),
 (1, 'a', 'd'): array([0, 2]),
 (1, 'b', 'c'): array([3]),
 (2, 'b', 'd'): array([4, 5])}

This should do the trick.

1 Comment

This is great for when you actually care about the groups and want to categorize your data. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.