7

I have a movie dataframe with movie names, their respective genre, and vector representation (numpy arrays).

ID  Year    Title   Genre   Word Vector
1   2003.0  Dinosaur Planet Documentary [-0.55423898, -0.72544044, 0.33189204, -0.1720...
2   2004.0  Isle of Man TT 2004 Review  Sports & Fitness    [-0.373265237, -1.07549703, -0.469254494, -0.4...
3   1997.0  Character   Foreign [-1.57682264, -0.91265768, 2.43038678, -0.2114...
4   1994.0  Paula Abdul's Get Up & Dance    Sports & Fitness    [0.3096168, -0.57186663, 0.39008939, 0.2868615...
5   2004.0  The Rise and Fall of ECW    Sports & Fitness    [0.17175879, -2.38005066, -0.45771399, 1.32608...

I'd like to group by genre and get each genre's average vector representation (the component wise average of each movie vector in the genre).


I first tried:

movie_df.groupby(['Genre']).mean()

But the built in mean function isn't able to take the mean of numpy arrays.

I tried creating my own function to do so and then apply it to each group, but I'm not sure this is using apply correctly:

def vector_average(group):
   series_to_array = np.array(group.tolist())
   return np.mean(series_to_array, axis = 0)

movie_df.groupby(['Genre']).apply(vector_average)

Any pointers would be appreciated!

5
  • Can you please print out df.head(5) and paste it here? Commented Aug 17, 2017 at 4:18
  • Yes, but in the question. Commented Aug 17, 2017 at 4:21
  • I'm unfamiliar with the best way to provide a sample of the dataframe - advice here would be appreciated too! Commented Aug 17, 2017 at 4:22
  • Okay. Your Word Vector is a column of numpy arrays or lists? Commented Aug 17, 2017 at 4:27
  • they're numpy arrays Commented Aug 17, 2017 at 4:32

2 Answers 2

11

If I understand correctly, to get the component-wise averages you can simply apply np.mean to the 'Word Vector' SeriesGroupBy explicitly.

df.groupby('Genre')['Word Vector'].apply(np.mean)

Demo

>>> df = pd.DataFrame({'Title': list('ABCDEFGHIJ'), 
                       'Genre': list('ABCBBDCDED'), 
                       'Word Vector': [np.random.randint(0, 10, 10) 
                                       for _ in range(len('ABCDEFGHIJ'))]})

>>> df

  Genre Title                     Word Vector
0     A     A  [3, 6, 8, 0, 4, 8, 1, 4, 0, 1]
1     B     B  [5, 4, 4, 4, 8, 7, 4, 3, 7, 2]
2     C     C  [1, 7, 6, 7, 3, 3, 8, 1, 8, 1]
3     B     D  [0, 4, 6, 7, 1, 5, 5, 0, 6, 7]
4     B     E  [8, 2, 1, 4, 1, 2, 0, 4, 9, 1]
5     D     F  [7, 9, 7, 8, 8, 7, 2, 9, 1, 3]
6     C     G  [0, 7, 1, 9, 6, 2, 1, 0, 3, 7]
7     D     H  [4, 7, 9, 4, 1, 5, 0, 3, 0, 6]
8     E     I  [5, 1, 5, 1, 8, 1, 1, 4, 5, 6]
9     D     J  [7, 9, 0, 1, 8, 3, 8, 8, 1, 0]

>>> df.groupby('Genre')['Word Vector'].apply(np.mean)

Genre
A    [3.0, 6.0, 8.0, 0.0, 4.0, 8.0, 1.0, 4.0, 0.0, ...
B    [4.33333333333, 3.33333333333, 3.66666666667, ...
C    [0.5, 7.0, 3.5, 8.0, 4.5, 2.5, 4.5, 0.5, 5.5, ...
D    [6.0, 8.33333333333, 5.33333333333, 4.33333333...
E    [5.0, 1.0, 5.0, 1.0, 8.0, 1.0, 1.0, 4.0, 5.0, ...
Name: Word Vector, dtype: object
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you this works! For the sake of completeness I had tried movie_df.groupby(['Genre']).apply(np.mean). It had outputs for the ID & Year column but nothing for the vector column?
@perennial_nomad If you try and call np.mean on the entire DataFrame, it will only provide results for columns with numeric datatypes - here, 'Word Vector' is of type object. And you're welcome!
Another followup - this returns a pandas series and when I try to write this to a dataframe using to_frame, it only returns the Word Vector column with the genres as labels on the side. Is there a way to directly convert into a 20 x 2 df with 'Genre' and 'Word Vectors'?
@perennial_nomad Call .reset_index() on the solution I provided above, maybe :)
0

FYI

If you have lists of numbers in your column "Word Vector", you must cast it in numpy arrays before :

df['Word Vector'] = df['Word Vector'].apply(np.array)
df.groupby('Genre')['Word Vector'].apply(np.mean)#.apply(list)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.