0

Input: CSV with 5 columns.

Expected Output: Unique combinations of 'col1', 'col2', 'col3'.

Sample Input:

   col1 col2 col3 col4 col5 

0   A    B    C    11   30

1   A    B    C    52   10

2   B    C    A    15   14 

3   B    C    A     1   91 

Sample Expected Output:

col1 col2 col3

A     B     C

B     C     A

Just expecting this as output. I don't need col4 and col5 in output. And also don't need any sum, count, mean etc. Tried using pandas to achieve this but no luck.

My code:

input_df = pd.read_csv("input.csv");

output_df = input_df.groupby(['col1', 'col2', 'col3'])

This code is returning 'pandas.core.groupby.DataFrameGroupBy object at 0x0000000009134278'. But I need dataframe like above. Any help much appreciated.

2 Answers 2

3
df[['col1', 'col2', 'col3']].drop_duplicates()
Sign up to request clarification or add additional context in comments.

Comments

0

First you can use .drop() to delete col4 and col5 as you said you don't need them.

df = df.drop(['col4', 'col5'], axis=1)

Then, you can use .drop_duplicates() to delete the duplicate rows in col1, col2 and col3.

df = df.drop_duplicates(['col1', 'col2', 'col3'])
df

The output:

col1    col2    col3
0   A   B   C
2   B   C   A

You noticed that in the output the index is 0, 2 instead of 0,1. To fix that you can do this:

df.index = range(len(df))
df

The output:

col1    col2    col3
0   A   B   C
1   B   C   A

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.