I am trying to parse text data in Pandas DataFrame based on certain tags and values in another column's fields and store them in their own columns. For example, if I created this dataframe, df:
df = pd.DataFrame([[1,2],['A: this is a value B: this is the b val C: and here is c.','A: and heres another a. C: and another c']])
df = df.T
df.columns = ['col1','col2']
df['tags'] = df['col2'].apply(lambda x: re.findall('(?:\s|)(\w*)(?::)',x))
all_tags = []
for val in df['tags']:
all_tags = all_tags + val
all_tags = list(set(all_tags))
for val in all_tags:
df[val] = ''
df:
col1 col2 tags A C B
0 1 A: this is a value B: this is the b val C: and... [A, B, C]
1 2 A: and heres another a. C: and another c [A, C]
How would I populate each of the new "tag" columns with their values from col2 so I get this df:
col1 col2 tags \
0 1 A: this is a value B: this is the b val C: and... [A, B, C]
1 2 A: and heres another a. C: and another c [A, C]
A C B
0 this is a value and here is c. this is the b val
1 and heres another a. and another c

