I'm doing some operations on a Pandas dataframe. For a certain column, I need to convert each cell to a numpy array which is not hard. The end goal is to get a 2D array as a result from the whole column. However, when I perform the following operation, I got a 1D array, and the inner arrays are not recognized.
df = pd.DataFrame({'col': ['abc', 'def']})
mapping = {v: k for k, v in enumerate('abcdef')}
df['new'] = df['col'].apply(lambda x: list(x))
df['new'].apply(lambda x: np.array([mapping[i] for i in x])).values
This gives:
array([array([0, 1, 2]), array([3, 4, 5])], dtype=object)
and the shape is (2,), meaning the inner arrays are not recognized.
If I do s.reshape(2,-1), I got (2,1) instead of (2,3) for the shape.
Appreciate any help!
Clarification:
The above is only a toy example. What I was doing was preprocessing for machine learning using the IMDB dataset. I had to convert each value in a review column to a word embedding which is a numpy array. Now the challenge is to get all these arrays out as a 2D array, so that I can use them in my machine learning model.
np.array(df['new'].values.tolist())ornp.stack(df['new'])tolist()will mean it's no longer an arraytolist, you will get an array of type object with a shape of(2,)tolist()drops it out to a python list, which you're just going to convert back to an array? You could just leave it at.values? Or am I missing something