Pandas - Extract value from Dataframe based on certain key value not in a sequence

Question

I have a Dataframe in the below format:

id, ref
101, [{'id': '74947', 'type': {'id': '104', 'name': 'Sales', 'inward': 'Sales', 'outward': 'PO'}, 'inwardIssue': {'id': '76560', 'key': 'Prod-A'}}]
102, [{'id': '74948', 'type': {'id': '105', 'name': 'Return', 'inward': 'Return Order', 'outward': 'PO'}, 'inwardIssue': {'id': '76560', 'key': 'Prod-C'}}, 
      {'id': '750001', 'type': {'id': '342', 'name': 'Sales', 'inward': 'Sales', 'outward': 'PO'}, 'inwardIssue': {'id': '76560', 'key': 'Prod-X'}}]
103, [{'id': '74949', 'type': {'id': '106', 'name': 'Sales', 'inward': 'Return Order', 'outward': 'PO'}, 'inwardIssue': {'id': '76560', 'key': 'Prod-B'}},
104, [{'id': '67543', 'type': {'id': '106', 'name': 'Other', 'inward': 'Return Order', 'outward': 'PO'}, 'inwardIssue': {'id': '76560', 'key': 'Prod-BA'}}]

I am trying to extract rows that have name = Sales and return back the below output:

101, Prod-A
102, Prod-X
103, Prod-B

I am able to extract the required data if the key value pair appears at the first instance but I am not able to do so if it is not the first instance like in the case of id = 102

df['names'] = df['ref'].str[0].str.get('type').str.get('name')
df['value'] = df['ref'].str[0].str.get('inwardIssue').str.get('key')
df['output'] = np.where(df['names'] == 'Sales', df['value'], 0)

Currently I am able to only get values for id = 101, 103

halfer · Accepted Answer · 2020-12-03 20:29:26Z

2

Let us do explode

s=pd.DataFrame(df.ref.explode().tolist())
s=s.loc[s.type.str.get('name').eq('Sales'),'inwardIssue'].str.get('key')
dfs=df.join(s,how='right')
    id                                                ref inwardIssue
0  101  [{'id': '74947', 'type': {'id': '104', 'name':...      Prod-A
2  103  [{'id': '74949', 'type': {'id': '106', 'name':...      Prod-X
3  104  [{'id': '67543', 'type': {'id': '106', 'name':...      Prod-B

edited Dec 3, 2020 at 20:29

halfer

20.2k20 gold badges110 silver badges207 bronze badges

answered Apr 25, 2020 at 1:14

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Andy L. · Accepted Answer · 2020-04-25 03:01:45Z

If you already have a dataframe in that format, you may convert it to json format and use pd.json_normalize to turn original df to a flat dataframe and slicing/filering on this flat dataframe.

df1 = pd.json_normalize(df.to_dict(orient='records'), 'ref')

The output of this flat dataframe df1

Out[83]:
       id type.id type.name   type.inward type.outward inwardIssue.id  \
0   74947     104     Sales         Sales           PO          76560
1   74948     105    Return  Return Order           PO          76560
2  750001     342     Sales         Sales           PO          76560
3   74949     106     Sales  Return Order           PO          76560
4   67543     106     Other  Return Order           PO          76560

  inwardIssue.key
0          Prod-A
1          Prod-C
2          Prod-X
3          Prod-B
4         Prod-BA

Finally, slicing on df1

df_final = df1.loc[df1['type.name'].eq('Sales'), ['type.id', 'inwardIssue.key']]

Out[88]:
  type.id inwardIssue.key
0     104          Prod-A
2     342          Prod-X
3     106          Prod-B

Collectives™ on Stack Overflow

Pandas - Extract value from Dataframe based on certain key value not in a sequence

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related