I have a dataframe named df_sample. It contains three columns. The first column ('pid') is an identification number of an item. The second column ('did') is the identification of where the item is located. The third column ('tid_dict') is a dictionary of where the item should have come from with how many of those items that location had in stock.
I want to check how often (A) the item actually came from one of the locations it should of come from, and (B) if the item came from the location that had the largest quantity. Complicating things is that sometimes the item isn't located as available from any location and other times it comes from somewhere other than where expected. The following sets up a sample dataframe:
import pandas as pd
column_names = ["pid", "tid", "tid_dict"]
data = [['p26CE0DEAC1', 't29', {'t29': 50, 't121': 41, 't140': 33}], ['p5505CB1A96', 't121', {'t156': 48}], ['p1B9E6A73EC', 't256',{}]]
df_sample = pd.DataFrame(data, columns = column_names)
Then I want to add a new column called "loc_check" that checks to see if the value in 'tid' is one of the keys in 'tid_dict'. Then a second new column named 'inv_check' to see if it was the location with the greatest number of available inventory.
df_sample['loc_check'] = #Don't know how to do this part - if 'tid_dict' contains 'tid' = True
df_sample['inv_check'] = #Don't know how to do this part - if 'tid' = 'tid_dict' key with greater value = True
So, in the end I want the dataframe to look like this:
column_names = ["pid", "tid", "tid_dict", 'loc_check', 'inv_check']
data = [['p26CE0DEAC1', 't29', {'t29': 50, 't121': 41, 't140': 33}, True, True], ['p5505CB1A96', 't121', {'t156': 48}, False, False], ['p1B9E6A73EC', 't256',{}, False, False]]
df_sample = pd.DataFrame(data, columns = column_names)
Any help is appreciated. Sorry if something isn't clear. I'm a hobbyist that is still beginning to learn python and pandas.
Follow-up:
column_names = ["pid", "tid", "tid_dict"]
data = [['p26CE0DEAC1', 't121', {'t29': 50, 't121': 50, 't140': 33}], ['p5505CB1A96', 't121', {'t156': 48}], ['p1B9E6A73EC', 't256',{}]]
df_sample = pd.DataFrame(data, columns = column_names)
How to account for this situation where the answer below returns a True/False even though 't121' has the same number of inventory items available as location 't29'?