0

My dataframe looks like this:

     ID         Class
      0           9
      1           8
      1           6
      2           6
      2           2
      3           15
      3           1
      3           8

What I would like to do is merging rows with same ID value in a way below:

    ID       Class1 Class2 Class3
    0           9
    1           8      6
    2           6      2
    3           15     1      8

So for each ID which exists more than once, I want to create new column(s) and move values from rows to those columns. What is the fastest way to do this? I tried using groupby but it didn't give me appriopate results.

2 Answers 2

4

Use set_index with cumcount for new columns, reshape by unstack and last rename columns by add_prefix:

df = df.set_index(['ID', df.groupby('ID').cumcount()])['Class']
       .unstack()
       .add_prefix('Class')
       .reset_index()

print (df)
   ID  Class0  Class1  Class2
0   0     9.0     NaN     NaN
1   1     8.0     6.0     NaN
2   2     6.0     2.0     NaN
3   3    15.0     1.0     8.0

Another solution is create list per groups and then new DataFrame by constructor:

s = df.groupby('ID')['Class'].apply(list)
df = pd.DataFrame(s.values.tolist(), index=s.index)
       .add_prefix('Class')
       .reset_index()
print (df)
   ID  Class0  Class1  Class2
0   0       9     NaN     NaN
1   1       8     6.0     NaN
2   2       6     2.0     NaN
3   3      15     1.0     8.0

EDIT:

df = df.set_index('ID')
df1=pd.get_dummies(df['Class']).reindex(columns=range(17), fill_value=0).add_prefix('Class')
df1 = df1.groupby(level=0).max().reset_index()
print (df1)
   ID  Class0  Class1  Class2  Class3  Class4  Class5  Class6  Class7  Class8  \
0   0       0       0       0       0       0       0       0       0       0   
1   1       0       0       0       0       0       0       1       0       1   
2   2       0       0       1       0       0       0       1       0       0   
3   3       0       1       0       0       0       0       0       0       1   

   Class9  Class10  Class11  Class12  Class13  Class14  Class15  Class16  
0       1        0        0        0        0        0        0        0  
1       0        0        0        0        0        0        0        0  
2       0        0        0        0        0        0        0        0  
3       0        0        0        0        0        0        1        0  
Sign up to request clarification or add additional context in comments.

9 Comments

That works, thanks. And how about different way to move values from rows to columns, I mean that if class is for example 5, it should appear in column named Class5. Is it possible to do this in a pretty simple way?
And same way for all column names? do you need get_dummies ?
Yes, thanks, it works fine, but one last issue: How to create columns with for example Class 8, if it isn't in my first dataset? get_dummies create only columns with existing classes.
then need df = df.reindex(columns=range(9), fill_value=0) - it add all missing values
Glad can help. Good luck!
|
0

Or you can try

df.groupby('ID').Class.apply(lambda x : x.tolist()).to_frame()['Class'].apply(pd.Series).add_prefix('Class_').fillna(' ')
Out[602]: 
    Class_0 Class_1 Class_2
ID                         
0       9.0                
1       8.0       6        
2       6.0       2        
3      15.0       1       8

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.