Python - How to remove duplicate columns and add as rows in csv

Question

I have following csv

Name     Date     Qty   Date     Qty   Date     Qty
---------------------------------------------------
ABC       Jan 2023   10    Feb 2023    11    Mar 2023    12
XYZ       Jan 2023   20    Feb 2023    21    Mar 2023    22

I want output as follows in csv/dataframe

Name     Date     Qty
---------------------
ABC       Jan 2023   10
ABC       Feb 2023   11
ABC       Mar 2023   12
XYZ       Jan 2023   20
XYZ       Feb 2023   21
XYZ       Mar 2023   22

How I achieve this result?

Corralien · Accepted Answer · 2023-02-01 22:21:27Z

3

A bit complicated but does the job. You can execute step by step to view the transformation:

>>> (df.melt('Name').assign(row=lambda x: x.groupby('variable').cumcount())
       .pivot(['row', 'Name'], 'variable', 'value')
       .reset_index('Name').rename_axis(index=None, columns=None))

  Name      Date Qty
0  ABC  Jan 2023  10
1  XYZ  Jan 2023  20
2  ABC  Feb 2023  11
3  XYZ  Feb 2023  21
4  ABC  Mar 2023  12
5  XYZ  Mar 2023  22

answered Feb 1, 2023 at 22:21

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

newbi Over a year ago

Actually, I accepted both answers. I really didn't know it only accept one answer.

newbi Over a year ago

one more question. If there is one more column beside [Name], and it is [City]. How the melt function defined ? Thanks.

Corralien Over a year ago

df.melt(id_vars=['Name', 'City'])...pivot(['row', 'Name', 'City'], ...)...reset_index(['Name', 'City'])...`

Corralien Over a year ago

You can define a variable cols=['Name', 'City'] and replace all occurence of 'Name' by cols. And for pivot ['row'] + cols

newbi Over a year ago

Great, ['row']+cols. I tried earlier ['row', 'Name', 'City'] given error. Thanks!

mitoRibo · Accepted Answer · 2023-02-01 22:30:12Z

1

Less streamlined solution compared to @Corralien's. Also uses melt and pivot.

import pandas as pd
import io

#-----------------------------------------------#
#Recreate OP's table with duplicate column names#
#-----------------------------------------------#
df = pd.read_csv(io.StringIO("""
ABC       Jan-2023   10    Feb-2023    11    Mar-2023    12
XYZ       Jan-2023   20    Feb-2023    21    Mar-2023    22
"""),header=None,delim_whitespace=True)

df.columns = ['Name','Date','Qty','Date','Qty','Date','Qty']


#-----------------#
#Start of solution#
#-----------------#
#melt from wide to long (maintains order)
melted_df = df.melt(
    id_vars='Name',
    var_name='col',
    value_name='val',
)

#add a number for Date1/Date2/Date3 to keep track of Qty1/Qty2/Qty3 etc
melted_df['col_number'] = melted_df.groupby(['Name','col']).cumcount()

#pivot back to wide form
wide_df = melted_df.pivot(
    index=['Name','col_number'],
    columns='col',
    values='val',
).reset_index().drop(columns=['col_number'])

wide_df.columns.name = None #remove column index name

#Final output
print(wide_df)

Output

  Name      Date Qty
0  ABC  Jan-2023  10
1  ABC  Feb-2023  11
2  ABC  Mar-2023  12
3  XYZ  Jan-2023  20
4  XYZ  Feb-2023  21
5  XYZ  Mar-2023  22

answered Feb 1, 2023 at 22:30

mitoRibo

4,5981 gold badge16 silver badges24 bronze badges

1 Comment

newbi Over a year ago

Hey MitoRibo, I tied this too and worked for me. Thank you very much.

Collectives™ on Stack Overflow

Python - How to remove duplicate columns and add as rows in csv

2 Answers 2

5 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related