1

New to python - using v3. I have a dataframe column that looks like

object
{"id":"http://Demo/1.7","definition":{"name":{"en-US":"Time Training New"}},"objectType":"Activity"}
{"id":"http://Demo/1.7","definition":{"name":{"en-US":"Time Influx"}},"objectType":"Activity"}
{"id":"http://Demo/1.7","definition":{"name":{"en-US":"Social"}},"objectType":"Activity"}
{"id":"http://Demo/2.18","definition":{"name":{"en-US":"Personal"}},"objectType":"Activity"}

I need to extract the activity, which starts in a variable place and is of variable length. I do not know what the activities are. All the questions I've found are to extract a specific string or pattern, not an unknown one. If I use the code below

dataExtract['activity'] = dataExtract['object'].str.find('en-US":"')

Will give me the start index and this

dataExtract['activity'] = dataExtract['object'].str.rfind('"}}')

Will give me the end index. So I have tried combining these

dataExtract['activity'] = dataExtract['object'].str[dataExtract['object'].str.find('en-US":"'):dataExtract['object'].str.rfind('"}}')]

But that just generates "NaN", which is clearly wrong. What syntax should I use, or is there a better way to do it? Thanks

2
  • How do you convert your dictionaries/objects to a pandas dataframe? Please provide a small code snippet for better understanding and demostration. Does "definition" always contain one dictionary/object only? Commented Feb 17, 2020 at 13:53
  • The data is in a csv, so import pandas as pd dataExtract = pd.read_csv('training.csv') Commented Feb 17, 2020 at 14:07

1 Answer 1

2

I suggest convert values to nested dictionaries and then extract by nested keys:

#if necessary
#import ast
#dataExtract['object'] = dataExtract['object'].apply(ast.literal_eval)

dataExtract['activity'] = dataExtract['object'].apply(lambda x: x['definition']['name']['en-US'])

print (dataExtract)
                                              object           activity
0  {'id': 'http://Demo/1.7', 'definition': {'name...  Time Training New
1  {'id': 'http://Demo/1.7', 'definition': {'name...        Time Influx
2  {'id': 'http://Demo/1.7', 'definition': {'name...             Social
3  {'id': 'http://Demo/2.18', 'definition': {'nam...           Personal

Details:

print (dataExtract['object'].apply(lambda x: x['definition']))
0    {'name': {'en-US': 'Time Training New'}}
1          {'name': {'en-US': 'Time Influx'}}
2               {'name': {'en-US': 'Social'}}
3             {'name': {'en-US': 'Personal'}}
Name: object, dtype: object

print (dataExtract['object'].apply(lambda x: x['definition']['name']))
0    {'en-US': 'Time Training New'}
1          {'en-US': 'Time Influx'}
2               {'en-US': 'Social'}
3             {'en-US': 'Personal'}
Name: object, dtype: object
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.