Python cannot reindex from a duplicate axis

Question

I am using groupby to merge rows with the same TransactionId.
Code

ldf_object_page_data.groupby('TransactionId')[columns].agg(
                ' '.join).reset_index()

Error cannot reindex from a duplicate axis
Sample DF

Transaction_Date    Particulars Others  Others  Cheque Number   Debit   Credit  Balance IsTransactionStart  TransactionId
Date    Remarks Tran Id UTR Number  Instr. ID   Withdrawals Deposits    Balance False   11
01/04/2020  AA1746128   S71737774   -       57000       -4,84,31,253.20 False   11
03/04/2020  TO MADHAV LAAD  AA213003    -   33215031    7000        -4,84,38,253.20 False   11
03/04/2020  TO PANDRINATH GANGRADE  AA214967    -   33215032    13000       -4,84,51,253.20 False   11
03/04/2020  TO NITIN DHANGAR    AA216517    -   33215034    30000       -4,84,81,253.20 False   11
03/04/2020  RTGSO- ELECTRICITY EXP MPPKVVCL UBINH20094172099    S80318780   -   33215033    5,68,499.00     -4,90,49,752.20 True    12
03/04/2020  RTGSO-BHARAT COTTON GINNERS UBINH20094172392    S80321244   -   33215035    3,44,708.00     -4,93,94,460.20 True    13
06/04/2020  OIC153500   DO KHANDWA  S89963710   -   33211781    63407       -4,94,57,867.20 False   13
07/04/2020  RTGS:DHARA AGRO INDUSTRIES ICIC409700372928 S93671963   -           8,93,238.00 -4,85,64,629.20 False   13
08/04/2020  TRF TO JITENDRA SINGH UBEJA AA205798    -   33215036    7,00,000.00     -4,92,64,629.20 True    14

DF in CSV

are you trying to do something like this df.groupby('TransactionId')[df.columns].agg( ' '.join).reset_index() — Eren Han
– Eren Han, Commented Oct 12, 2022 at 11:12
@ErenHan i have stored the values of df.columns in columns variable. — donny
– donny, Commented Oct 12, 2022 at 11:26
Index(['Transaction_Date', 'ValueDate', 'Particulars', '', 'ran type', '', 'cheque details', 'Debit', 'Credit', 'Balance', 'Credit/Debit', '', 'IsTransactionStart', 'TransactionId'], dtype='object') — donny
– donny, Commented Oct 12, 2022 at 11:38
@donny - yop, there are duplicates columns names, so need first deduplicate them. — jezrael
– jezrael, Commented Oct 12, 2022 at 11:41

jezrael · Accepted Answer · 2022-10-12 11:20:50Z

1

Problem is duplicated columns names, first is necessary deduplicate them and then join with converting to strings:

df.columns = pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)

df = (df.set_index('TransactionId')
        .astype(str)
        .groupby('TransactionId')
        .agg(' '.join)
        .reset_index())

If need remove duplicates:

df = (df.set_index('TransactionId')
        .astype(str)
        .groupby('TransactionId')
        .agg(lambda x: ' '.join(dict.fromkeys(x)))
        .reset_index())

answered Oct 12, 2022 at 11:20

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python cannot reindex from a duplicate axis

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related