0

I have dataframe (df) like shown below

Input

ShipID                                                                             CustomerCode  
['USWPR04-20210429-S-00001', 'USWPR04-20210429-S-00002','USWPR04-20210429-S-00006']    USWPR04
['MSLPR04-20210429-S-00001', 'MSLPR04-20210429-S-00002']                               MSLPR04

I need to create new column called df['LinkID'] which is nested array of the above columns.

Output

df['LinkID']

[{ "shipID": "USWPR04-20210429-S-00001", "customerCode": "USWPR04", "shiNumber": "20210429-S-00001" },
 { "shipID": "USWPR04-20210429-S-00002", "customerCode": "USWPR04", "shipNumber": "20210429-S-00002" },
 { "ShipID": "USWPR04-20210429-S-00002", "customerCode": "USWPR04", "shipNumber": "20210429-S-00006" }]

[{ "shipID": "MSLPR04-20210429-S-00001", "customerCode": "MSLPR04", "shiNumber": "20210429-S-00001" },
{ "shipID": "MSLPR04-20210429-S-00002", "customerCode": "MSLPR04", "shipNumber": "20210429-S-00002" }]

Final Dataframe Output

ShipID                                                                             CustomerCode   link
['USWPR04-20210429-S-00001', 'USWPR04-20210429-S-00002','USWPR04-20210429-S-00006']    USWPR04    [{ "shipID": "USWPR04-20210429-S-00001", "customerCode": "USWPR04", "shiNumber": "20210429-S-00001" },{ "shipID": "USWPR04-20210429-S-00002", "customerCode": "USWPR04", "shipNumber": "20210429-S-00002" },{ "ShipID": "USWPR04-20210429-S-00002", "customerCode": "USWPR04", "shipNumber": "20210429-S-00006" }]
['MSLPR04-20210429-S-00001', 'MSLPR04-20210429-S-00002']                               MSLPR04    [{ "shipID": "MSLPR04-20210429-S-00001", "customerCode": "MSLPR04", "shiNumber": "20210429-S-00001" },{ "shipID": "MSLPR04-20210429-S-00002", "customerCode": "MSLPR04", "shipNumber": "20210429-S-00002" }]

How can this be done?

1
  • @Nk03, ShipNumber are last two parts of ShipID. For eg. USWPR04-20210429-S-00001 is ShipID then ShipNumber is 20210429-S-00001 Commented May 28, 2021 at 11:01

1 Answer 1

2

Updated answer:

STEPS:

  1. Use eval if required.
  2. Explode the dataframe on ShipID.
  3. Extract the shipNumber using .str.split method.
  4. use to_dict('records') and again load this into a dataframe.
  5. Use groupby and agg using list to transform it back to the original structure.
# df.ShipID = df.ShipID.apply(eval)
df2 = df.explode('ShipID')
df2['shipNumber'] = df2.ShipID.str.split('-',1).str[-1]
df2['link'] = pd.DataFrame({'link': df2.to_dict('records')})
df['link'] = df2.groupby(df2.index).agg(list)['link']
Sign up to request clarification or add additional context in comments.

6 Comments

it works , but it makes all columns as arrays, I just want df[Link] to be array.
Your all other initial columns are array if I'm not wrong. Can you please post your complete expected output?
just ShipId is a array. CustomerCode is a normal dataframe column
@NK)3, I pasted final dataframe template
@aeapen use another df to perform the operation and then just take the link column and attach it back to the 1st dataframe that you've. I've not tested the updated code but it should work.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.