1

I have a list of values that I am trying to match with a column of a pandas df and then would like to create a dictionary that will have list values as keys and then dictionary values from a different column from the data frame.

This is how I have my list:

sample_list = [101,105,112]

My Data Frame:

sample_df = pd.DataFrame([[101, "NJ"], [105, "CA"],[111, "MO"], [101, "NJ"], [112, "NB"], [101, "NJ"], [105, "CA"]], \
                         columns=["Col1", "Col2"])

looks like this,

    Col1    Col2
0   101     NJ
1   105     CA
2   111     MO
3   101     NJ
4   112     NB
5   101     NJ
6   105     CA

Now, I am trying to iterate list values (which are keys of my new_dict)and match them with Col1 and if they match I would like to extract Col2 values as my dictionary values. This is how I have my code so far,

new_dict = {}
for value in sample_list:
    for i in sample_df['Col1']:
        if value == i:
            new_dict[value] = [i for i in sample_df['Col2']]

However, my new_dict looks like this,

{101: ['NJ', 'CA', 'MO', 'NJ', 'NB', 'NJ', 'CA'],
 105: ['NJ', 'CA', 'MO', 'NJ', 'NB', 'NJ', 'CA'],
 112: ['NJ', 'CA', 'MO', 'NJ', 'NB', 'NJ', 'CA']}

I need my output like this,

{101: ['NJ'],
 105: ['CA'],
 112: ['NB']}

How can I get to my desired output? Any help would be nice.

2
  • Why do you need the values in new_dict to be lists? Commented Jun 25, 2018 at 21:25
  • @NickD it doesn't have to be a list, i thought of putting into a list. Commented Jun 25, 2018 at 21:40

2 Answers 2

3

This will do it:

new_dict = {i: [sample_df[sample_df['Col1']==i]['Col2'].values[0]] for i in sample_list}
Sign up to request clarification or add additional context in comments.

Comments

1

Alt 1

If you insist here is another solution that should be efficient by using isin() to create a mask used to filter away not desired rows.

m = sample_df['Col1'].isin(sample_list)
sample_df[m].drop_duplicates().groupby('Col1')['Col2'].apply(list).to_dict()

Returns: {101: ['NJ'], 105: ['CA'], 112: ['NB']}

note: if there are more non-unique combos they will be in the list too. Use: {k:[v] for k,v in sample_df[m].groupby('Col1')['Col2'].first().items()} if you only want the first.


Alt 2

If you are going for list items but not all why not just the values?

m = sample_df['Col1'].isin(sample_list)
sample_df[m].set_index('Col1')['Col2'].to_dict()

Returns: {101: 'NJ', 105: 'CA', 112: 'NB'}


Alt 3

or if you want all the items:

m = sample_df['Col1'].isin(sample_list)
sample_df[m].groupby('Col1')['Col2'].apply(list).to_dict()

Returns: {101: ['NJ', 'NJ', 'NJ'], 105: ['CA', 'CA'], 112: ['NB']}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.