Adding Column in Pandas Dataframe Based on column with dictionary values

Question

I have a dataframe named df_sample. It contains three columns. The first column ('pid') is an identification number of an item. The second column ('did') is the identification of where the item is located. The third column ('tid_dict') is a dictionary of where the item should have come from with how many of those items that location had in stock.

I want to check how often (A) the item actually came from one of the locations it should of come from, and (B) if the item came from the location that had the largest quantity. Complicating things is that sometimes the item isn't located as available from any location and other times it comes from somewhere other than where expected. The following sets up a sample dataframe:

import pandas as pd 
column_names = ["pid", "tid", "tid_dict"]
data = [['p26CE0DEAC1', 't29', {'t29': 50, 't121': 41, 't140': 33}], ['p5505CB1A96', 't121', {'t156': 48}], ['p1B9E6A73EC', 't256',{}]]

df_sample = pd.DataFrame(data, columns = column_names)

Then I want to add a new column called "loc_check" that checks to see if the value in 'tid' is one of the keys in 'tid_dict'. Then a second new column named 'inv_check' to see if it was the location with the greatest number of available inventory.

df_sample['loc_check'] = #Don't know how to do this part - if 'tid_dict' contains 'tid' = True
df_sample['inv_check'] = #Don't know how to do this part - if 'tid' = 'tid_dict' key with greater value = True

So, in the end I want the dataframe to look like this:

column_names = ["pid", "tid", "tid_dict", 'loc_check', 'inv_check']
data = [['p26CE0DEAC1', 't29', {'t29': 50, 't121': 41, 't140': 33}, True, True], ['p5505CB1A96', 't121', {'t156': 48}, False, False], ['p1B9E6A73EC', 't256',{}, False, False]]

df_sample = pd.DataFrame(data, columns = column_names)

Any help is appreciated. Sorry if something isn't clear. I'm a hobbyist that is still beginning to learn python and pandas.

Follow-up:

column_names = ["pid", "tid", "tid_dict"]
data = [['p26CE0DEAC1', 't121', {'t29': 50, 't121': 50, 't140': 33}], ['p5505CB1A96', 't121', {'t156': 48}], ['p1B9E6A73EC', 't256',{}]]

df_sample = pd.DataFrame(data, columns = column_names)

How to account for this situation where the answer below returns a True/False even though 't121' has the same number of inventory items available as location 't29'?

Dharman · Accepted Answer · 2021-05-04 04:57:54Z

1

You can use df.apply(lambda function, axis=1) for both the questions

Code

df_sample['loc_check'] = df_sample.apply(lambda x: x['tid'] in x['tid_dict'], axis=1)
df_sample['inv_check'] = df_sample.apply(lambda x:x['tid']==max(x['tid_dict'], key=x['tid_dict'].get) if x['tid_dict'] != {} else False, axis=1)

Output:

pid         tid     tid_dict                            loc_check   inv_check
p26CE0DEAC1 t29     {'t29': 50, 't121': 41, 't140': 33} True    True
p5505CB1A96 t121    {'t156': 48}                        False   False
p1B9E6A73EC t256    {}                                  False   False

Explanation

df_sample['loc_check'] = df_sample.apply(lambda x: x['tid'] in x['tid_dict'], axis=1)

This part simply checks for each row, whether tid exists in tid_dict and stores the result in column loc_check

The next one is a bit more complicated

df_sample['inv_check'] = df_sample.apply(lambda x:x['tid']==max(x['tid_dict'], key=x['tid_dict'].get) if x['tid_dict'] != {} else False, axis=1)

max(x['tid_dict'], key=x['tid_dict'].get) is used to return the key with the max value in tid_dict.
x['tid']==max is then used to check whether the key returned is the same as 'tid'
The if check is just to prevent an error when dictionary is empty (like in the third case)

edited May 4, 2021 at 4:57

Dharman♦

33.9k27 gold badges106 silver badges157 bronze badges

answered May 4, 2021 at 4:51

Shubham Periwal

2,2582 gold badges10 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mgadfly Over a year ago

Thanks for the answer. I've found one bug and I'm trying to figure out how to account for it. If two locations have the same number of inventory items, it sometimes returns a False. For example, if 't121' had 50 available items and 'tid' ended up 't121' it returns a False boolean value when I'd like it to be True even if another site had the same number of inventory items available.

Shubham Periwal Over a year ago

Oh ok then what you can do is get the max value in the dict (should return 50) and then check if tid_dict[tid] = 50. This should fix the bug and work in all cases

Collectives™ on Stack Overflow

Adding Column in Pandas Dataframe Based on column with dictionary values

1 Answer 1

Code

Output:

Explanation

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Code

Output:

Explanation

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related