3

I have this dictionary:

 j =  {1: {'help': 2},
 2: {'chocolate': 1, 'help': 1},
 3: {'chocolate': 1, 'help': 1}}

and this dataframe:

df = pd.DataFrame({'docId': [1, 2, 3, 1, 2, 3, ],
                       'sent': ['help', 'chocolate', 'chocolate', 'help', 'help', 'help']})

and I want to match the values according to docId and term so it should look like this:

docId  sent        freq
1      help         2
2      chocolate    1
3      chocolate    1
1      help         2
2      help         1
3      help         1

I'm not sure how to accomplish this, I was working on using map and apply but I didn't get anywhere.

3 Answers 3

6

Remake your dictionary

With tuples as keys, you can map the get method over zipped columns

J = {(x, y): v for x, V in j.items() for y, v in V.items()}

df.assign(freq=[*map(J.get, zip(df.docId, df.sent))])


   docId       sent  freq
0      1       help     2
1      2  chocolate     1
2      3  chocolate     1
3      1       help     2
4      2       help     1
5      3       help     1

Or don't

You can use a lambda in map that takes two arguments and pass the iterables that supply the arguments.

df.assign(freq=[*map(lambda x, y: j[x][y], df.docId, df.sent)])

   docId       sent  freq
0      1       help     2
1      2  chocolate     1
2      3  chocolate     1
3      1       help     2
4      2       help     1
5      3       help     1
Sign up to request clarification or add additional context in comments.

Comments

6

How about a list comprehension? You can chain two dict.get calls (one for each level of nesting).

df['freq'] = [
    j.get(x, {}).get(y, np.nan) for x, y in df[['docId', 'sent']].values]
df

   docId       sent  freq
0      1       help     2
1      2  chocolate     1
2      3  chocolate     1
3      1       help     2
4      2       help     1
5      3       help     1

If you can guarantee all entries exist in j, you can simplify the above to,

df['freq'] = [j[x][y] for x, y in df[['docId', 'sent']].values]
df

   docId       sent  freq
0      1       help     2
1      2  chocolate     1
2      3  chocolate     1
3      1       help     2
4      2       help     1
5      3       help     1

1 Comment

Chaining dict.get worked for my when I had another level in my nested dictionary. I had trouble adapting some of the other answers, in case anyone is trying a more complicated dictionary
4

IIUC try something different using reindex

s=pd.DataFrame(j).stack().reindex(pd.MultiIndex.from_arrays([df.sent,df.docId])).reset_index()
s
Out[81]: 
        sent  docId    0
0       help      1  2.0
1  chocolate      2  1.0
2  chocolate      3  1.0
3       help      1  2.0
4       help      2  1.0
5       help      3  1.0

End Up using this method lookup

df['Freq']=pd.DataFrame(j).lookup(df.sent,df.docId)
df
Out[95]: 
   docId       sent  Freq
0      1       help   2.0
1      2  chocolate   1.0
2      3  chocolate   1.0
3      1       help   2.0
4      2       help   1.0
5      3       help   1.0

8 Comments

You beat me to this solution. +1 :)
@ScottBoston maybe you can try merge :-) would like to see that function as well : -)
df.join(pd.DataFrame(j).unstack().rename_axis([*df]).rename('freq'), on=[*df])
@piRSquared yes that is nice , I end up using lookup :-)
OOHH! lookup I like it!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.