0

I have many repetitive index values in certain sequence. I want to fetch the 1st value of my index cell 'ID' and when ever that value repeats, then append that set of values to next column. Here my Index values repeat 4 times, so output will have 4 columns labeled from i=1,2,3,4...till N for N no. of repeat index sets.

The starting value of index cell will differ for other data sets, but the values will repeat in same sequence.

Sample Dataset: df

ID,1
7,0.060896109
10,0.384263675
27,0.780060081
43,0.583200572
57,0.139564176
73,0.595220898
91,0.828783841
7,0.39920022
10,0.157306146
27,0.29750421
43,0.742234942
57,0.971849921
73,0.346905033
91,0.996723279
7,0.192197827
10,0.922323942
27,0.033304593
43,0.462253505
57,0.282632609
73,0.553047118
91,0.07678817
7,0.428707324
10,0.250935035
27,0.529861617
43,0.982468147
57,0.473807591
73,0.340980584
91,0.436675534

in

Expected Output Sample:

ID,1,2,3,4
7,0.060896109,0.39920022,0.192197827,0.428707324
10,0.384263675,0.157306146,0.922323942,0.250935035
27,0.780060081,0.29750421,0.033304593,0.529861617
43,0.583200572,0.742234942,0.462253505,0.982468147
57,0.139564176,0.971849921,0.282632609,0.473807591
73,0.595220898,0.346905033,0.553047118,0.340980584
91,0.828783841,0.996723279,0.07678817,0.436675534

out

2
  • First ID is index? Commented Sep 30, 2019 at 10:22
  • @jezrael - yes.1st id will be the index. 7 in this case, maybe some other value in another dataset, but the format will be the same Commented Sep 30, 2019 at 10:24

2 Answers 2

2

Use DataFrame.pivot with helper column created by compare by first column with cumulative sum:

df = df.assign(g=np.cumsum(df.index == df.index[0])).pivot(columns='g',values='1')
print (df)
g          1         2         3         4
ID                                        
7   0.060896  0.399200  0.192198  0.428707
10  0.384264  0.157306  0.922324  0.250935
27  0.780060  0.297504  0.033305  0.529862
43  0.583201  0.742235  0.462254  0.982468
57  0.139564  0.971850  0.282633  0.473808
73  0.595221  0.346905  0.553047  0.340981
91  0.828784  0.996723  0.076788  0.436676
Sign up to request clarification or add additional context in comments.

4 Comments

for my full dataset, the index values are rearranging in the final output, all values starting with 1,10,100,110, etc are shown initially, then the 2,21,203,..then the 3's...and so on ..
@Axay - Then use df.assign(g=np.cumsum(df.index == df.index[0])).pivot(columns='g',values='1').reindex(df.index.unique())
Or before my solution df.index = df.index.astype(int) or df = df.rename(int)
df.assign(g=np.cumsum(df.index == df.index[0])).pivot(columns='g',values='1').reindex(df.index.unique()) This worked perfectly.
2

You can use pd.pivot_table with a little extra work on the columns:

pd.pivot_table(data=df, index='ID', columns=df.groupby('ID').cumcount(), values='1')

       0         1         2         3
ID                                        
7   0.060896  0.399200  0.192198  0.428707
10  0.384264  0.157306  0.922324  0.250935
27  0.780060  0.297504  0.033305  0.529862
43  0.583201  0.742235  0.462254  0.982468
57  0.139564  0.971850  0.282633  0.473808
73  0.595221  0.346905  0.553047  0.340981
91  0.828784  0.996723  0.076788  0.436676

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.