0

I have the following scenario: I have a train.csv file as the one below. Each row is mentioned 4 times with same index value.

Index sentence ending0 ending1 ending2 ending3 

0        ABC     DEF     GHI     JKL     MNO     
0        ABC     DEF     GHI     JKL     MNO       
0        ABC     DEF     GHI     JKL     MNO     
0        ABC     DEF     GHI     JKL     MNO       
1        LKJ     KJS     AJA     QHW     IUH             
...      ...     ...     ...     ...     ...
...
...  
2 
...
...
...     

What i am wanting to get is shown below:

Index sentence ending-id ending 
0       ABC       0        DEF    
0       ABC       1        GHI    
0       ABC       2        JKL    
0       ABC       3        MNO    
1       LKJ       0        KJS 
...     ...      ...       ...
...
...   

4 Answers 4

1

You could try something like this:

from itertools import cycle
df=df.set_index('Index').drop_duplicates()
newdf= pd.DataFrame(data=df.sentence, columns=['sentence'], index=df.index)
newdf['ending']=df[df.columns[1:]].values.tolist()
newdf=newdf.explode('ending')
ids = cycle([0,1,2,3])
newdf.insert(1, 'endingid', [next(ids) for idx in range(len(newdf))])
print(newdf)

Output:

      sentence  endingid ending
Index                          
0          ABC         0    DEF
0          ABC         1    GHI
0          ABC         2    JKL
0          ABC         3    MNO
1          LKJ         0    KJS
1          LKJ         1    AJA
1          LKJ         2    QHW
1          LKJ         3    IUH
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the help as i got what i need. :)
Sure, glad it helps you. @OverDose Maybe could you accept it? :)
0

I am getting the below result with this code so far.

sentence Index value ending
ABC        0    DEF    0
ABC        0    DEF    0
ABC        0    DEF    0

while i am looking for a result like the one below:

Index sentence ending-id ending 
0       ABC       0        DEF    
0       ABC       1        GHI    
0       ABC       2        JKL    
0       ABC       3        MNO  

1 Comment

@warped please have a look at this.
0
df = _df.copy()
df = pd.melt(df.drop_duplicates(), id_vars=['sentence', 'Index'], value_vars=['ending0','ending1','ending2','ending3'])
df['ending-id'] = df.variable.str.extract('([0-9]+)')
df.rename(columns={'value':'ending'}, inplace=True)
df.drop('variable', axis=1, inplace=True)
df.set_index('Index', inplace=True)

4 Comments

Your answer is correct to some extent thanks. But i want different ending in reach row of the same index. For example: Index sentence ending 0 ABC DEF 0 ABC GHI 0 ABC JKL
@OverDose Check my edit. Is this what you meant, or did you want to get rid of duplicates first?
i have answered my question below to demonstrate the end result. I really appreciate your help. :)
Thanks for the help :)
0

@MrNobody33 I am getting the below result with this code so far.

sentence ending ending-id 
ABC        ABC     0     
ABC        DEF     1  
ABC        GHI     2
ABC        JKL     3
ABC        MNO     0   

while i am looking for a result like the one below:

Index sentence ending-id ending 
0       ABC       0        DEF    
0       ABC       1        GHI    
0       ABC       2        JKL    
0       ABC       3        MNO  

2 Comments

@MrNobody33 please have a look at this please. I really appreciate your help in this regard. We are almost there :)
Yeah, see the last edit of my answer @OverDose :), where I used insert and reset_index('Index'). If the column df['Index'] is already your dataframe.index, erase the reset_index('Index') from second line of code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.