1
  • I am using groupby to merge rows with the same TransactionId.
  • Code
ldf_object_page_data.groupby('TransactionId')[columns].agg(
                ' '.join).reset_index()
  • Error cannot reindex from a duplicate axis
  • Sample DF
Transaction_Date    Particulars Others  Others  Cheque Number   Debit   Credit  Balance IsTransactionStart  TransactionId
Date    Remarks Tran Id UTR Number  Instr. ID   Withdrawals Deposits    Balance False   11
01/04/2020  AA1746128   S71737774   -       57000       -4,84,31,253.20 False   11
03/04/2020  TO MADHAV LAAD  AA213003    -   33215031    7000        -4,84,38,253.20 False   11
03/04/2020  TO PANDRINATH GANGRADE  AA214967    -   33215032    13000       -4,84,51,253.20 False   11
03/04/2020  TO NITIN DHANGAR    AA216517    -   33215034    30000       -4,84,81,253.20 False   11
03/04/2020  RTGSO- ELECTRICITY EXP MPPKVVCL UBINH20094172099    S80318780   -   33215033    5,68,499.00     -4,90,49,752.20 True    12
03/04/2020  RTGSO-BHARAT COTTON GINNERS UBINH20094172392    S80321244   -   33215035    3,44,708.00     -4,93,94,460.20 True    13
06/04/2020  OIC153500   DO KHANDWA  S89963710   -   33211781    63407       -4,94,57,867.20 False   13
07/04/2020  RTGS:DHARA AGRO INDUSTRIES ICIC409700372928 S93671963   -           8,93,238.00 -4,85,64,629.20 False   13
08/04/2020  TRF TO JITENDRA SINGH UBEJA AA205798    -   33215036    7,00,000.00     -4,92,64,629.20 True    14
7
  • are you trying to do something like this df.groupby('TransactionId')[df.columns].agg( ' '.join).reset_index() Commented Oct 12, 2022 at 11:12
  • @ErenHan i have stored the values of df.columns in columns variable. Commented Oct 12, 2022 at 11:26
  • What is print (df.columns) ? Commented Oct 12, 2022 at 11:34
  • Index(['Transaction_Date', 'ValueDate', 'Particulars', '', 'ran type', '', 'cheque details', 'Debit', 'Credit', 'Balance', 'Credit/Debit', '', 'IsTransactionStart', 'TransactionId'], dtype='object') Commented Oct 12, 2022 at 11:38
  • @donny - yop, there are duplicates columns names, so need first deduplicate them. Commented Oct 12, 2022 at 11:41

1 Answer 1

1

Problem is duplicated columns names, first is necessary deduplicate them and then join with converting to strings:

df.columns = pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)

df = (df.set_index('TransactionId')
        .astype(str)
        .groupby('TransactionId')
        .agg(' '.join)
        .reset_index())

If need remove duplicates:

df = (df.set_index('TransactionId')
        .astype(str)
        .groupby('TransactionId')
        .agg(lambda x: ' '.join(dict.fromkeys(x)))
        .reset_index())
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.