How to merge two lists into dictionary without using nested for loop

Question

I have two lists:

a = [0, 0, 0, 1, 1, 1, 1, 1, .... 99999]
b = [24, 53, 88, 32, 45, 24, 88, 53, ...... 1]

I want to merge those two lists into a dictionary like:

{
    0: [24, 53, 88], 
    1: [32, 45, 24, 88, 53], 
    ...... 
    99999: [1]
}

A solution might be using for loop, which does not look good and elegant, like:

d = {}
unique_a = list(set(list_a))
for i in range(len(list_a)):
    if list_a[i] in d.keys:
        d[list_a[i]].append(list_b[i])
    else:
        d[list_a] = [list_b[i]]

Though this does work, it’s an inefficient and would take too much time when the list is extremely large. I want to know more elegant ways to construct such a dictionary?

Thanks in advance!

DYM if list_a[i] in d.keys and d[list_a[i]] = [list_b[i]]? Please post exactly the code you've tried, preferably using copy+paste (if available on your platform). — Toby Speight
– Toby Speight, Commented Nov 1, 2017 at 9:31
If one of the provided answers worked for you, please mark it as accepted. It makes it easier for people coming across your question in the future to know what worked. — Engineero
– Engineero, Commented Nov 1, 2017 at 15:16
@TobySpeight if means if list_a[i] is already a key in the dictionary, then add list_b[i] into the dictionary under key list_a[i], whereas else means that if not, add list_b[i] to the new key list_a[i]` as list. Hope it helps. — BigD
– BigD, Commented Nov 1, 2017 at 19:15
@BigD, I thought that's what you meant to write (as I suggested). list_[a] in d.keys just doesn't make sense, and neither does d[list_a] =. I suggest you edit to fix those errors. — Toby Speight
– Toby Speight, Commented Nov 2, 2017 at 8:34

Ajax1234 · Accepted Answer · 2017-11-01 10:32:32Z

34

You can use a defaultdict:

from collections import defaultdict
d = defaultdict(list)
list_a = [0, 0, 0, 1, 1, 1, 1, 1, 9999]
list_b = [24, 53, 88, 32, 45, 24, 88, 53, 1]
for a, b in zip(list_a, list_b):
   d[a].append(b)

print(dict(d))

Output:

{0: [24, 53, 88], 1: [32, 45, 24, 88, 53], 9999: [1]}

edited Nov 1, 2017 at 10:32

answered Oct 31, 2017 at 22:09

Ajax1234

71.7k9 gold badges67 silver badges110 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

cs95 Over a year ago

Really, using a defaultdict is overkill here. See this answer where dict.setdefault can handle the same thing with minimal overhead.

Ajax1234 Over a year ago

@cᴏʟᴅsᴘᴇᴇᴅ d[a].append(b) is much cleaner than d.setdefault(x, []).append(y)

cs95 Over a year ago

At the cost of an extra import and a heavier structure ;-)

RomanPerekhrest · Accepted Answer · 2017-11-01 08:27:03Z

14

Alternative itertools.groupby() solution:

import itertools

a = [0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3]
b = [24, 53, 88, 32, 45, 24, 88, 53, 11, 22, 33, 44, 55, 66, 77]

result = { k: [i[1] for i in g] 
           for k,g in itertools.groupby(sorted(zip(a, b)), key=lambda x:x[0]) }
print(result)

The output:

{0: [24, 53, 88], 1: [24, 32, 45, 53, 88], 2: [11, 22, 33, 44, 55, 66], 3: [77]}

edited Nov 1, 2017 at 8:27

answered Oct 31, 2017 at 22:13

RomanPerekhrest

93.1k4 gold badges75 silver badges112 bronze badges

5 Comments

dǝɥɔS ʇoıןןƎ Over a year ago

Sure, I’ve figured out what your code does, but written in that style, it’s not very obvious. For the person new to Python, I think they may find your code hard to understand and then disregard (or not bother to regard) your solution because of it. Just a suggestion, up to you

Brian McCutchon Over a year ago

Might I suggest not writing result in one line? Maybe pull out the result of groupby as a separate variable? That line is way too long...

Daenyth Over a year ago

This seems worse than the other answer because you need to sort, whereas the other answer does not sort, so yours is doing extra work.

RomanPerekhrest Over a year ago

@Daenyth, your information is not new at all. The solution was marked as "alternative" way at the very begining.

Padraic Cunningham Over a year ago

If list_a is already ordered, you can remove the n log n sort, also the lambda adds unnecessary overhead, itemgetter is always a better option. {k: [i for _, i in g] for k, g in groupby(zip(a, b), key=itemgetter(0))}

cs95 · Accepted Answer · 2017-11-01 13:24:00Z

6

No fancy structures, just a plain ol' dictionary.

d = {}
for x, y in zip(a, b):
    d.setdefault(x, []).append(y)

answered Nov 1, 2017 at 13:24

cs95

406k106 gold badges744 silver badges797 bronze badges

Comments

Engineero · Accepted Answer · 2017-10-31 22:32:22Z

3

You can do this with a dict comprehension:

list_a = [0, 0, 0, 1, 1, 1, 1, 1]
list_b = [24, 53, 88, 32, 45, 24, 88, 53]
my_dict = {key: [] for key in set(a)}  # my_dict = {0: [], 1: []}
for a, b in zip(list_a, list_b):
    my_dict[a].append(b)
# {0: [24, 53, 88], 1: [32, 45, 24, 88, 53]}

Oddly enough, you cannot seem to make this work using dict.fromkeys(set(list_a), []) as this will set the value of all keys equal to the same empty array:

my_dict = dict.fromkeys(set(list_a), [])  # my_dict = {0: [], 1: []}
my_dict[0].append(1)  # my_dict = {0: [1], 1: [1]}

answered Oct 31, 2017 at 22:32

Engineero

13k5 gold badges56 silver badges79 bronze badges

1 Comment

Padraic Cunningham Over a year ago

lists are mutable, you pass one object/list to fromkeys so you share a reference to the single list/object, it would be the same as a = [] then d = {1: a, 2: a, 3: a}. my_dict = dict.fromkeys(set(list_a), tuple());my_dict[0] += (1,) would show {0: (1,), 1: (), 9999: ()} but add the overhead of creating a new object/tuple with each +=.

FabienP · Accepted Answer · 2017-10-31 23:11:57Z

A pandas solution:

Setup:

import pandas as pd

a = [0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 3, 4, 4, 4]

b = pd.np.random.randint(0, 100, len(a)).tolist()

>>> b
Out[]: [28, 68, 71, 25, 25, 79, 30, 50, 17, 1, 35, 23, 52, 87, 21]


df = pd.DataFrame(columns=['Group', 'Value'], data=list(zip(a, b)))  # Create a dataframe

>>> df
Out[]:
    Group  Value
0       0     28
1       0     68
2       0     71
3       1     25
4       1     25
5       1     79
6       1     30
7       1     50
8       2     17
9       2      1
10      2     35
11      3     23
12      4     52
13      4     87
14      4     21

Solution:

>>> df.groupby('Group').Value.apply(list).to_dict()
Out[]:
{0: [28, 68, 71],
 1: [25, 25, 79, 30, 50],
 2: [17, 1, 35],
 3: [23],
 4: [52, 87, 21]}

Walkthrough:

create a pd.DataFrame from the input lists, a is called Group and b called Value
df.groupby('Group') creates groups based on a
.Value.apply(list) gets the values for each group and cast it to list
.to_dict() converts the resulting DataFrame to dict

Timing:

To get an idea of timings for a test set of 1,000,000 values in 100,000 groups:

a = sorted(np.random.randint(0, 100000, 1000000).tolist())
b = pd.np.random.randint(0, 100, len(a)).tolist()
df = pd.DataFrame(columns=['Group', 'Value'], data=list(zip(a, b)))

>>> df.shape
Out[]: (1000000, 2)

%timeit df.groupby('Group').Value.apply(list).to_dict()
4.13 s ± 9.29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

But to be honest it is likely less efficient than itertools.groupby suggested by @RomanPerekhrest, or defaultdict suggested by @Ajax1234.

Giorgi Jambazishvili · Accepted Answer · 2017-10-31 22:12:36Z

2

Maybe I miss the point, but at least I will try to help. If you have to lists and want to put them in the dict do the following

a = [1, 2, 3, 4]
b = [5, 6, 7, 8]
lists = [a, b] # or directly -> lists = [ [1, 2, 3, 4], [5, 6, 7, 8] ]
new_dict = {}
for idx, sublist in enumerate([a, b]): # or enumerate(lists)
    new_dict[idx] = sublist

hope it helps

answered Oct 31, 2017 at 22:12

Giorgi Jambazishvili

7435 silver badges16 bronze badges

1 Comment

Izkata Over a year ago

This isn't even close to what OP wants. a contains the keys for the values in b (with some keys being duplicates), and index isn't used at all. Yours just creates { 0: a, 1: b }, using the index in lists.

U13-Forward · Accepted Answer · 2018-11-06 04:17:48Z

0

Or do dictionary comprehension beforehand, then since all keys are there with values of empty lists, iterate trough the zip of the two lists, then add the second list's value to the dictionary's key naming first list's value, no need for try-except clause (or if statements), to see if the key exists or not, because of the beforehand dictionary comprehension:

d={k:[] for k in l}
for x,y in zip(l,l2):
   d[x].append(y)

Now:

print(d)

Is:

{0: [24, 53, 88], 1: [32, 45, 24, 88, 53], 9999: [1]}

answered Nov 6, 2018 at 4:17

U13-Forward

71.8k15 gold badges100 silver badges125 bronze badges

Collectives™ on Stack Overflow

How to merge two lists into dictionary without using nested for loop

7 Answers 7

3 Comments

5 Comments

Comments

1 Comment

Setup:

Solution:

Walkthrough:

Timing:

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

3 Comments

5 Comments

Comments

1 Comment

Setup:

Solution:

Walkthrough:

Timing:

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related