I have a dataframe that I'm importing from a text file. The file is organized where certain columns include multiple pieces of data separated by comma. In effect, for certain indices in the df the column value is a list. However pandas isn't reading the data as such, rather as a string that happens to include commas. (Example in the MRE below)
What I ultimately want to do is use df.explode to expand these columns into separate rows but first I need to get pandas to recognize the data as a list. Obviously I could loop through the whole df but there's got to be a vectorized solution here.
Sample code:
import pandas as pd
import numpy as np
df_data = {'Day': ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun'],
'Visit':['MK', ['E', 'DAK'], 'MK',
['DHS', 'E'], 'E', ['DAK', 'DHS', 'E'], 'MK'],
'Visit2':['MK', 'E, DAK', 'MK',
'DHS, E', 'E', 'DAK, DHS, E', 'MK']}
df = pd.DataFrame(data = df_data)
print(df.explode('Visit'))
print(df.explode('Visit2'))
The data I'm dealing with looks like column Visit2 but what I want is something like Visit. OR some other data transformation that ends up with the desired result:
Day Visit
0 Mon MK
1 Tues E
1 Tues DAK
2 Weds MK
3 Thurs DHS
3 Thurs E
4 Fri E
5 Sat DAK
5 Sat DHS
5 Sat E
6 Sun MK