I have a pandas dataframe with a catch-all column called "Misc", which contains optional sequences of characters. For example:
Misc
1. xxx=something;yyyblah=somethingelse;xyx=blah
2. xyz=meh;yzxx=random;xyx=meh
I am really only interested in 4-5 values/cases of something=something; and I would like to create new columns and add them to my dataframe for those instances, and "." or NaN if they do not exist. So if I was interested in xxx= ... ; and xyx=...; my code would do the following:
Misc xxx xyx
1. xxx=something;yyyblah=somethingelse;xyx=blah | something | blah
2. xyz=meh;yzxx=random;xyx=meh | . | meh
All of the information in Misc will begin with a set of 20-30 strings, and end with ";". I have tried using regexes ...
df['xxx'] = df.Misc.str.extract(r'*(xxx=)*;)$', expand=True)
but that does not seem to be working. I also thought about simply removing all instances I do not care about, and then splitting so I get consistency. Any ideas?