0

This question is similar to this one pandas: split a string column into multiple columns and dynamically name columns. I modified the data as below

df = pd.DataFrame.from_dict({'study_id': {0: 'study1',
  1: 'study2',
  2: 'study3',
  3: 'study4',
  4: 'study5'},
 'fuzzy_market': {0: '[Age: 18-67], [Country of Birth: Austria], [Country of Birth: Germany], [Country: Austria], [Country: Germany], [Language: German]',
  1: '[Country: Germany], [Management experience: Yes]',
  2: '[Country: United Kingdom], [Language: English]',
  3: '[Age: 18-67], [Country of Birth: Austria], [Country of Birth: Germany], [Country: Austria], [Country: Germany], [Language: German]',
  4: '[Age: 48-99]'}})

I would like the output to be as below.

study_id    Age     Country of Birth    Country         Language      Management experience
study1     18-67    Austria             Austria         German        None
study1     18-67    Germany             Germany         German        None
study2     None     None                Germany         None          Yes
study3     None     None                United Kingdom  English       None

How should I modify these codes. Thank you.

p = df['fuzzy_market'].str.findall(r'([^:\[]+): ([^\]]+)')
df[['study_id']].join(pd.DataFrame(map(dict, p)))

1 Answer 1

0

try this:

data = [*df.pop('fuzzy_market').str.findall(r'([^:\[]+): ([^\]]+)').map(dict)]
res = df.join(pd.DataFrame(data, index=df.index))
print(res)
Sign up to request clarification or add additional context in comments.

1 Comment

this does not work as there should be two rows for study1.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.