1

I am reading a csv file that a column contains a multi keys dict. Here is an example:

import pandas as pd
df = pd.DataFrame({'a':[1,2,3], 'b':[{'AUS': {'arv': '10:00', 'vol': 5}, 'DAL': {'arv': '9:00', 'vol': 1}}, {'DAL': {'arv': '10:00', 'vol': 6}, 'NYU': {'arv': '10:00', 'vol': 3}}, {'DAL': {'arv': '8:00', 'vol': 6}, 'DAL': {'arv': '10:00', 'vol': 1}, 'GBD': {'arv': '12:00', 'vol': 1}}]})

What I am trying to do is perform a query on the column b of the above dataframe and return the corresponding values as presented in the following. However, I am trying to see if there is a more intuitive and more efficient way to perform similar operations in a large dataset without looping through the dict.

#convert column b of df to a dict
df_dict = df.b.to_dict()
print(df_dict)
{0: {'AUS': {'arv': '10:00', 'vol': 5}, 'DAL': {'arv': '9:00', 'vol': 1}}, 1: {'DAL': {'arv': '10:00', 'vol': 6}, 'NYU': {'arv': '10:00', 'vol': 3}}, 2: {'DAL': {'arv': '10:00', 'vol': 1}, 'GBD': {'arv': '12:00', 'vol': 1}}}

def get_value(my_str, my_time):
    total = 0
    for key in df_dict:
        if my_str in df_dict[key].keys():
            if df_dict[key].get(my_str).get('arv') == my_time:
                total = total + df_dict[key].get(my_str).get('vol')
    return total

print("total vol is at 10:00 is: ", get_value('DAL', '10:00'))
total vol is at 10:00 is:  7
0

2 Answers 2

1

While dukkee's answer works, I believe if you want to manipulate your dataframe in other ways his organization is a bit counterintuitive. I would also reorganize the dataframe, though this way:

input_data = {
    'a':[1,2,3], 
    'b':[{'AUS': {'arv': '10:00', 'vol': 5},
         'DAL': {'arv': '9:00', 'vol': 1}
        },
        {'DAL': {'arv': '10:00', 'vol': 6},
         'NYU': {'arv': '10:00', 'vol': 3}
        },
        {'DAL': {'arv': '8:00', 'vol': 6},
         'DAL': {'arv': '10:00', 'vol': 1},
         'GBD': {'arv': '12:00', 'vol': 1}
        }]
}

data_list = [[input_data['a'][i], key, value['arv'], value['vol']]
            for i, dic in enumerate(input_data['b'])
            for key, value in dic.items()]
df = pd.DataFrame(data_list, columns=['a', 'abr', 'arv', 'vol'])

Which results in:

>>> df
   a  abr    arv  vol
0  1  AUS  10:00    5
1  1  DAL   9:00    1
2  2  DAL  10:00    6
3  2  NYU  10:00    3
4  3  DAL  10:00    1
5  3  GBD  12:00    1

I believe that's the way you should organize your data. Having dictionaries as values in a dataframe seems counterintuitive to me. This way you can use loc to solve your problem:

>>> df.loc[(df['arv']=='10:00') & (df['abr']=='DAL')]
   a  abr    arv  vol
2  2  DAL  10:00    6
4  3  DAL  10:00    1
>>> vol_sum = sum(df.loc[(df['arv']=='10:00') & (df['abr']=='DAL')]['vol'])
>>> print(f"total vol at 10:00 is: {vol_sum}")
"total vol at 10:00 is: 7"

Little plus compared to dukkee: no need to use collections, and list comprehensions are faster than for-loops! Note that in one of your dictionaries you have two times 'DAL' as a key, so the first one gets erased.

Sign up to request clarification or add additional context in comments.

Comments

1

I suggest you to reorganize your data presentation in DataFrame:

>>> from collections import defaultdict, Counter
>>> import pandas as pd
>>> input_data = {0: {"AUS": {"arv": "10:00", "vol": 5}, "DAL": {"arv": "9:00", "vol": 1}}, 1: {"DAL": {"arv": "10:00", "vol": 6}, "NYU": {"arv": "10:00", "vol": 3}}, 2: {"DAL": {"arv": "10:00", "vol": 1}, "GBD": {"arv": "12:00", "vol": 1}}}
>>> data = defaultdict(Counter)
>>> for value in input_data.values():
...     for name in value:
...         data[value[name]["arv"]][name] = value[name]["vol"]
... 
>>> data
defaultdict(<class "collections.Counter">, {"10:00": Counter({"DAL": 7, "AUS": 5, "NYU": 3}), "9:00": Counter({"DAL": 1}), "12:00": Counter({"GBD": 1})})
>>> frame = pd.DataFrame(data).T
>>> frame
       AUS  DAL  NYU  GBD
10:00  5.0  7.0  3.0  NaN
9:00   NaN  1.0  NaN  NaN
12:00  NaN  NaN  NaN  1.0
>>> frame[frame.index == "10:00"]["DAL"]
10:00    7.0
Name: DAL, dtype: float64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.