I have a CSV file where one of the columns looks like a numpy array. The first few lines look like the following
first,second,third
170.0,2,[19 234 376]
170.0,3,[19 23 23]
162.0,4,[1 2 3]
162.0,5,[1 3 4]
When I load the this CSV with pandas data frame and using the following code
data = pd.read_csv('myfile.csv', converters = {'first': np.float64, 'second': np.int64, 'third': np.array})
Now, I want to group by based on the 'first' column and union the 'third' column. So after doing this my dataframe should look like
170.0, [19 23 234 376]
162.0, [1 2 3 4]
How do I achieve this? I tried multiple ways like the following and nothing seems to help achieve this goal.
group_data = data.groupby('first')
group_data['third'].apply(lambda x: np.unique(np.concatenate(x)))
objectas thedtypewhich indicates to me it's in fact a string., can you post the output fromdata.info()and alsodata['second'].iloc[0]data = pd.read_csv('myfile.csv', header=None, names=['first','second'], converters = {'first': np.float64, 'second': np.array})The problem with your code is that your file does not have any header names so your converters will not find a match