How to add different column data in each duplicated csv row using python?

Question

I have the following scenario: I have a train.csv file as the one below. Each row is mentioned 4 times with same index value.

Index sentence ending0 ending1 ending2 ending3 

0        ABC     DEF     GHI     JKL     MNO     
0        ABC     DEF     GHI     JKL     MNO       
0        ABC     DEF     GHI     JKL     MNO     
0        ABC     DEF     GHI     JKL     MNO       
1        LKJ     KJS     AJA     QHW     IUH             
...      ...     ...     ...     ...     ...
...
...  
2 
...
...
...

What i am wanting to get is shown below:

Index sentence ending-id ending 
0       ABC       0        DEF    
0       ABC       1        GHI    
0       ABC       2        JKL    
0       ABC       3        MNO    
1       LKJ       0        KJS 
...     ...      ...       ...
...
...

MrNobody33 · Accepted Answer · 2020-06-23 20:47:58Z

1

You could try something like this:

from itertools import cycle
df=df.set_index('Index').drop_duplicates()
newdf= pd.DataFrame(data=df.sentence, columns=['sentence'], index=df.index)
newdf['ending']=df[df.columns[1:]].values.tolist()
newdf=newdf.explode('ending')
ids = cycle([0,1,2,3])
newdf.insert(1, 'endingid', [next(ids) for idx in range(len(newdf))])
print(newdf)

Output:

      sentence  endingid ending
Index                          
0          ABC         0    DEF
0          ABC         1    GHI
0          ABC         2    JKL
0          ABC         3    MNO
1          LKJ         0    KJS
1          LKJ         1    AJA
1          LKJ         2    QHW
1          LKJ         3    IUH

edited Jun 23, 2020 at 20:47

answered Jun 23, 2020 at 20:38

MrNobody33

6,5039 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

OverDose Over a year ago

Thanks for the help as i got what i need. :)

MrNobody33 Over a year ago

Sure, glad it helps you. @OverDose Maybe could you accept it? :)

OverDose · Accepted Answer · 2020-06-23 20:29:58Z

0

I am getting the below result with this code so far.

sentence Index value ending
ABC        0    DEF    0
ABC        0    DEF    0
ABC        0    DEF    0

while i am looking for a result like the one below:

Index sentence ending-id ending 
0       ABC       0        DEF    
0       ABC       1        GHI    
0       ABC       2        JKL    
0       ABC       3        MNO

answered Jun 23, 2020 at 20:29

OverDose

132 bronze badges

1 Comment

OverDose Over a year ago

@warped please have a look at this.

warped · Accepted Answer · 2020-06-23 20:39:25Z

0

df = _df.copy()
df = pd.melt(df.drop_duplicates(), id_vars=['sentence', 'Index'], value_vars=['ending0','ending1','ending2','ending3'])
df['ending-id'] = df.variable.str.extract('([0-9]+)')
df.rename(columns={'value':'ending'}, inplace=True)
df.drop('variable', axis=1, inplace=True)
df.set_index('Index', inplace=True)

edited Jun 23, 2020 at 20:39

answered Jun 23, 2020 at 19:26

warped

9,6655 gold badges26 silver badges55 bronze badges

4 Comments

OverDose Over a year ago

Your answer is correct to some extent thanks. But i want different ending in reach row of the same index. For example: Index sentence ending 0 ABC DEF 0 ABC GHI 0 ABC JKL

warped Over a year ago

@OverDose Check my edit. Is this what you meant, or did you want to get rid of duplicates first?

OverDose Over a year ago

i have answered my question below to demonstrate the end result. I really appreciate your help. :)

OverDose Over a year ago

Thanks for the help :)

OverDose · Accepted Answer · 2020-06-23 20:53:46Z

0

@MrNobody33 I am getting the below result with this code so far.

sentence ending ending-id 
ABC        ABC     0     
ABC        DEF     1  
ABC        GHI     2
ABC        JKL     3
ABC        MNO     0

while i am looking for a result like the one below:

Index sentence ending-id ending 
0       ABC       0        DEF    
0       ABC       1        GHI    
0       ABC       2        JKL    
0       ABC       3        MNO

answered Jun 23, 2020 at 20:53

OverDose

132 bronze badges

2 Comments

OverDose Over a year ago

@MrNobody33 please have a look at this please. I really appreciate your help in this regard. We are almost there :)

MrNobody33 Over a year ago

Yeah, see the last edit of my answer @OverDose :), where I used insert and reset_index('Index'). If the column df['Index'] is already your dataframe.index, erase the reset_index('Index') from second line of code.

Collectives™ on Stack Overflow

How to add different column data in each duplicated csv row using python?

4 Answers 4

2 Comments

1 Comment

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related