8

I have a data like this in Pandas dataframe

   id     import_id              investor_id     loan_id      meta
   35736  unremit_loss_100312         Q05         0051765139  {u'total_paid': u'75', u'total_expense': u'75'}
   35737  unremit_loss_100313         Q06         0051765140  {u'total_paid': u'77', u'total_expense': u'78'}
   35739  unremit_loss_100314         Q06         0051765141  {u'total_paid': u'80', u'total_expense': u'65'}

How to sort based on total_expense which is value of json field
ex: total_expense on meta field

Output should be

id     import_id              investor_id     loan_id      meta
35739  unremit_loss_100314         Q06         0051765141  {u'total_paid': u'80', u'total_expense': u'65'}
35736  unremit_loss_100312         Q05         0051765139  {u'total_paid': u'75', u'total_expense': u'75'}
35737  unremit_loss_100313         Q06         0051765140  {u'total_paid': u'77', u'total_expense': u'78'}
4
  • Try df = df.iloc[df['meta'].str.get('total_expense').argsort().values, :] and see if that works. If not, you will need to show us more than 1 row of data. Commented Apr 9, 2019 at 7:47
  • it is not working, all rows are becoming the same values if I use your logic. question modified and added more rows @coldspeed Commented Apr 10, 2019 at 6:39
  • I've provided two options. It seems you need to convert the extracted values to integer before sorting. Take a look and please clearly let me know what the issue is if it didn't work. Thanks. Commented Apr 11, 2019 at 19:04
  • @JameelGrand I have attempted a simple way to achieve it, does it solve your problem? Commented Apr 12, 2019 at 19:52

4 Answers 4

4
+25

Setup and Preprocessing

import ast
import numpy as np

if isinstance(x.at[0, 'meta'], str):
    df['meta'] = df['meta'].map(ast.literal_eval)

str.get with Series.argsort

df.iloc[df['meta'].str.get('total_expense').astype(int).argsort()]

      id            import_id investor_id   loan_id                                         meta
2  35739  unremit_loss_100314         Q06  51765141  {'total_paid': '80', 'total_expense': '65'}
0  35736  unremit_loss_100312         Q05  51765139  {'total_paid': '75', 'total_expense': '75'}
1  35737  unremit_loss_100313         Q06  51765140  {'total_paid': '77', 'total_expense': '78'}

List Comprehension

df.iloc[np.argsort([int(x.get('total_expense', '-1')) for x in df['meta']])]

      id            import_id investor_id   loan_id                                         meta
2  35739  unremit_loss_100314         Q06  51765141  {'total_paid': '80', 'total_expense': '65'}
0  35736  unremit_loss_100312         Q05  51765139  {'total_paid': '75', 'total_expense': '75'}
1  35737  unremit_loss_100313         Q06  51765140  {'total_paid': '77', 'total_expense': '78'}

If you need to handle NaNs/missing data, use

u = [  
  int(x.get('total_expense', '-1')) if isinstance(x, dict) else -1 
  for x in df['meta']
]
df.iloc[np.argsort(u)]

      id            import_id investor_id   loan_id                                         meta
2  35739  unremit_loss_100314         Q06  51765141  {'total_paid': '80', 'total_expense': '65'}
0  35736  unremit_loss_100312         Q05  51765139  {'total_paid': '75', 'total_expense': '75'}
1  35737  unremit_loss_100313         Q06  51765140  {'total_paid': '77', 'total_expense': '78'}
Sign up to request clarification or add additional context in comments.

Comments

2

Use:

print (df)
      id            import_id investor_id   loan_id  \
0  35736  unremit_loss_100312         Q05  51765139   
1  35736  unremit_loss_100312         Q05  51765139   
2  35736  unremit_loss_100312         Q05  51765139   

                                               meta  
0   {u'total_paid': u'75', u'total_expense': u'75'}  
1   {u'total_paid': u'75', u'total_expense': u'20'}  
2  {u'total_paid': u'75', u'total_expense': u'100'}  

import ast

df['meta'] = df['meta'].apply(ast.literal_eval)

df = df.iloc[df['meta'].str['total_expense'].astype(int).argsort()]

print (df)
      id            import_id investor_id   loan_id  \
1  35736  unremit_loss_100312         Q05  51765139   
0  35736  unremit_loss_100312         Q05  51765139   
2  35736  unremit_loss_100312         Q05  51765139   

                                           meta  
1   {'total_paid': '75', 'total_expense': '20'}  
0   {'total_paid': '75', 'total_expense': '75'}  
2  {'total_paid': '75', 'total_expense': '100'} 

If possible if missing total_expense key for some row convert missing values to some integer lower like all another values, like -1 for first position of these rows:

print (df)
      id            import_id investor_id   loan_id  \
0  35736  unremit_loss_100312         Q05  51765139   
1  35736  unremit_loss_100312         Q05  51765139   
2  35736  unremit_loss_100312         Q05  51765139   

                                              meta  
0  {u'total_paid': u'75', u'total_expense': u'75'}  
1  {u'total_paid': u'75', u'total_expense': u'20'}  
2                           {u'total_paid': u'75'} 

df['meta'] = df['meta'].apply(ast.literal_eval)


df = df.iloc[df['meta'].str.get('total_expense').fillna(-1).astype(int).argsort()]
print (df)
      id            import_id investor_id   loan_id  \
2  35736  unremit_loss_100312         Q05  51765139   
1  35736  unremit_loss_100312         Q05  51765139   
0  35736  unremit_loss_100312         Q05  51765139   

                                          meta  
2                         {'total_paid': '75'}  
1  {'total_paid': '75', 'total_expense': '20'}  
0  {'total_paid': '75', 'total_expense': '75'}  

Another solution:

df['new'] = df['meta'].str.get('total_expense').astype(int)
df = df.sort_values('new').drop('new', axis=1)

5 Comments

Thank you for the response , this is making all my rows to the same value. ie. all my id field values becoming same number, Updated question with more rows.
@JameelGrand - Can you add output after my solution to question?
@JameelGrand - added new solution, can you check it?
no luck :( it is giving unsorted dataframe and df['meta'].str.get('total_expense') is giving all Nan values , not actually taking values
@JameelGrand - do you use df['meta'] = df['meta'].apply(ast.literal_eval) like first step? and then df['new'] = df['meta'].str.get('total_expense').astype(int) ?
1

EDIT 2:

Found a slightly better way to do this without using apply:

from pandas.io.json import json_normalize

df = pd.concat([df, json_normalize(df['meta'])], axis = 1)\
       .sort_values(by = 'total_expense')\
       .drop(columns = ['total_paid', 'total_expense'])

EDIT:

df = pd.concat([df, df['meta'].apply(pd.Series)], axis = 1).sort_values(by = 'total_expense').drop(columns = ['total_paid', 'total_expense'])

If you want it to look like the original, just drop the columns you concat after sorting.

Original:

df = pd.concat([df, df['meta'].apply(pd.Series)], axis = 1).drop(columns ='meta').sort_values(by = 'total_expense')

df['meta'].apply(pd.Series) makes the dicts in the meta column into its own df. We can concat it with its original, drop the meta column (as its redundant), then sort the values by 'total expense'

1 Comment

@JameelGrand - edited my comment to make it look like the original df
0

Using regex:

df = pd.read_clipboard(r'\s\s+')
pattern = r"""u'total_expense': u'([0-9.]+)'"""
df['total_expense'] = df.meta.str.extract(pattern)
df.sort_values('total_expense')

Using apply:

df['total_expense'] = df.meta.apply(eval).apply(
                        lambda x: x.get('total_expense', -1))
df.sort_values('total_expense')

Output:

      id            import_id investor_id   loan_id  \
2  35739  unremit_loss_100314         Q06  51765141   
0  35736  unremit_loss_100312         Q05  51765139   
1  35737  unremit_loss_100313         Q06  51765140   

                                              meta total_expense  
2  {u'total_paid': u'80', u'total_expense': u'65'}            65  
0  {u'total_paid': u'75', u'total_expense': u'75'}            75  
1  {u'total_paid': u'77', u'total_expense': u'78'}            78  

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.