3

I am building a recommender system of foods and I have a dataframe:

df:
            meat vegetables cheese ketchup egg...
hamburger     3      5        2       2     1   
    pasta     0      0        4       0     1    
     soup     0      2        0       0     0     
      ...

I also have a list which contains ingredients that an user does not like:

dislike:["cheese", "egg"]  

So what I am trying to do is to create a function which adds a new row "user_name" with a 10 in those ingredients that he/she does not like and a 0 in all the others columns. Output should be:

            meat vegetables cheese ketchup egg...
hamburger     3      5        2       2     1   
    pasta     0      0        4       0     1    
     soup     0      2        0       0     0     
 new_user     0      0       10       0    10
...

I have simplify the dataframe and the list in order to make it more comprehensive, but they are actually way more longer.

This is what I have write until now:

def user_pre(df):
    dislike=["cheese","egg"]
    for ing in dislike:
            df.loc["new_user"]= pd.Series({ing:10})
    return df

I "works" but only for the last element in dislike list. Besides it does not add a 0 in the other cells but a Nan.

Thank you so much in advance!

4 Answers 4

3

I am not sure how "healthy" it is to mix users with dishes in a single pandas DataFrame but a function like this should do the work:

def insert_user_dislikes(user_name='new_user', df=df, ingredients=['meat', 'egg']):
    df.loc[user_name] = [10 if col in ingredients else 0 for col in df.columns]

insert_user_dislikes('new_user', df, ['meat', 'egg'])

Edit 1: I like @Fred's Solution as well:

def insert_user_dislikes2(user_name='new_user', df=df, ingredients=['meat', 'egg']):
    df.loc[user_name] = 0
    df.loc[user_name, ingredients] = 10
insert_user_dislikes('user_name', df, ['meat', 'egg'])

Edit 2: Here is Shubham's solution for performance assessment:

def insert_user_dislikes3(user_name='new_user', df=df, ingredients=['meat', 'egg']):
    s = pd.Series(
        np.where(df.columns.isin(ingredients), 10, 0), 
        name=user_name, index=df.columns, dtype='int')
    return df.append(s)

In term of performance (on a very small dataset), it looks like the list comprehension one is faster though:

df = pd.DataFrame([[3, 5, 2, 2, 1],
   [0, 0, 4, 0, 1]],
   columns=['meat', 'vegetables', 'cheese','ketchup', 'egg'],
   index=['hamburger', 'pasta'])

print(timeit.timeit(insert_user_dislikes, number=1000))
0.125

print(timeit.timeit(insert_user_dislikes2, number=1000))
0.547

print(timeit.timeit(insert_user_dislikes3, number=1000))
2.153
Sign up to request clarification or add additional context in comments.

Comments

2

I'm not sure about how efficient the approach is, but this should work

dislikes = ["cheese","egg"]
new_user = "Tom"
df.loc[new_user] = 0
for dislike in dislikes:
    if dislike not in df.columns:
        df[dislike] = 0
    df.loc[new_user, dislike] = 10

Comments

0

Set the new_user row = to zero, then filter and equal to 10.

print(df)
          meat  vegetables  cheese  ketchup  egg
hamburger     3           5       2        2    1
pasta         0           0       4        0    1
soup          0           2       0        0    0

Create new_user as zero.

df.loc["new_user", :] = 0
print(df)
          meat  vegetables  cheese  ketchup  egg
hamburger   3.0         5.0     2.0      2.0  1.0
pasta       0.0         0.0     4.0      0.0  1.0
soup        0.0         2.0     0.0      0.0  0.0
new_user    0.0         0.0     0.0      0.0  0.0

Then again but filtered and set to 10.

dislike = ["cheese", "egg"]

df.loc["new_user", dislike] = 10
print(df)
           meat  vegetables  cheese  ketchup   egg
hamburger   3.0         5.0     2.0      2.0   1.0
pasta       0.0         0.0     4.0      0.0   1.0
soup        0.0         2.0     0.0      0.0   0.0
new_user    0.0         0.0    10.0      0.0  10.0

Comments

0

You can use Series.isin to check which column values of dataframe are present in dislike list, then you can use DataFrame.append to append the newly created series s to the original dataframe df.

Use:

import numpy as np

s = pd.Series(
    np.where(df.columns.isin(dislike), 10, 0), 
    name='new_user', index=df.columns, dtype='int') # create a new pandas series

df = df.append(s)

The resulting dataframe df will be:

           meat  vegetables  cheese  ketchup  egg                                            
hamburger     3           5       2        2    1
pasta         0           0       4        0    1
soup          0           2       0        0    0
new_user      0           0      10        0   10

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.