19

I have two lists:

a = [0, 0, 0, 1, 1, 1, 1, 1, .... 99999]
b = [24, 53, 88, 32, 45, 24, 88, 53, ...... 1]

I want to merge those two lists into a dictionary like:

{
    0: [24, 53, 88], 
    1: [32, 45, 24, 88, 53], 
    ...... 
    99999: [1]
}

A solution might be using for loop, which does not look good and elegant, like:

d = {}
unique_a = list(set(list_a))
for i in range(len(list_a)):
    if list_a[i] in d.keys:
        d[list_a[i]].append(list_b[i])
    else:
        d[list_a] = [list_b[i]]

Though this does work, it’s an inefficient and would take too much time when the list is extremely large. I want to know more elegant ways to construct such a dictionary?

Thanks in advance!

5
  • 2
    How is that a nested for loop? Commented Nov 1, 2017 at 7:42
  • DYM if list_a[i] in d.keys and d[list_a[i]] = [list_b[i]]? Please post exactly the code you've tried, preferably using copy+paste (if available on your platform). Commented Nov 1, 2017 at 9:31
  • If one of the provided answers worked for you, please mark it as accepted. It makes it easier for people coming across your question in the future to know what worked. Commented Nov 1, 2017 at 15:16
  • @TobySpeight if means if list_a[i] is already a key in the dictionary, then add list_b[i] into the dictionary under key list_a[i], whereas else means that if not, add list_b[i] to the new key list_a[i]` as list. Hope it helps. Commented Nov 1, 2017 at 19:15
  • @BigD, I thought that's what you meant to write (as I suggested). list_[a] in d.keys just doesn't make sense, and neither does d[list_a] =. I suggest you edit to fix those errors. Commented Nov 2, 2017 at 8:34

7 Answers 7

34

You can use a defaultdict:

from collections import defaultdict
d = defaultdict(list)
list_a = [0, 0, 0, 1, 1, 1, 1, 1, 9999]
list_b = [24, 53, 88, 32, 45, 24, 88, 53, 1]
for a, b in zip(list_a, list_b):
   d[a].append(b)

print(dict(d))

Output:

{0: [24, 53, 88], 1: [32, 45, 24, 88, 53], 9999: [1]}
Sign up to request clarification or add additional context in comments.

3 Comments

Really, using a defaultdict is overkill here. See this answer where dict.setdefault can handle the same thing with minimal overhead.
@cᴏʟᴅsᴘᴇᴇᴅ d[a].append(b) is much cleaner than d.setdefault(x, []).append(y)
At the cost of an extra import and a heavier structure ;-)
14

Alternative itertools.groupby() solution:

import itertools

a = [0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3]
b = [24, 53, 88, 32, 45, 24, 88, 53, 11, 22, 33, 44, 55, 66, 77]

result = { k: [i[1] for i in g] 
           for k,g in itertools.groupby(sorted(zip(a, b)), key=lambda x:x[0]) }
print(result)

The output:

{0: [24, 53, 88], 1: [24, 32, 45, 53, 88], 2: [11, 22, 33, 44, 55, 66], 3: [77]}

5 Comments

Sure, I’ve figured out what your code does, but written in that style, it’s not very obvious. For the person new to Python, I think they may find your code hard to understand and then disregard (or not bother to regard) your solution because of it. Just a suggestion, up to you
Might I suggest not writing result in one line? Maybe pull out the result of groupby as a separate variable? That line is way too long...
This seems worse than the other answer because you need to sort, whereas the other answer does not sort, so yours is doing extra work.
@Daenyth, your information is not new at all. The solution was marked as "alternative" way at the very begining.
If list_a is already ordered, you can remove the n log n sort, also the lambda adds unnecessary overhead, itemgetter is always a better option. {k: [i for _, i in g] for k, g in groupby(zip(a, b), key=itemgetter(0))}
6

No fancy structures, just a plain ol' dictionary.

d = {}
for x, y in zip(a, b):
    d.setdefault(x, []).append(y)

Comments

3

You can do this with a dict comprehension:

list_a = [0, 0, 0, 1, 1, 1, 1, 1]
list_b = [24, 53, 88, 32, 45, 24, 88, 53]
my_dict = {key: [] for key in set(a)}  # my_dict = {0: [], 1: []}
for a, b in zip(list_a, list_b):
    my_dict[a].append(b)
# {0: [24, 53, 88], 1: [32, 45, 24, 88, 53]}

Oddly enough, you cannot seem to make this work using dict.fromkeys(set(list_a), []) as this will set the value of all keys equal to the same empty array:

my_dict = dict.fromkeys(set(list_a), [])  # my_dict = {0: [], 1: []}
my_dict[0].append(1)  # my_dict = {0: [1], 1: [1]}

1 Comment

lists are mutable, you pass one object/list to fromkeys so you share a reference to the single list/object, it would be the same as a = [] then d = {1: a, 2: a, 3: a}. my_dict = dict.fromkeys(set(list_a), tuple());my_dict[0] += (1,) would show {0: (1,), 1: (), 9999: ()} but add the overhead of creating a new object/tuple with each +=.
3

A pandas solution:

Setup:

import pandas as pd

a = [0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 3, 4, 4, 4]

b = pd.np.random.randint(0, 100, len(a)).tolist()

>>> b
Out[]: [28, 68, 71, 25, 25, 79, 30, 50, 17, 1, 35, 23, 52, 87, 21]


df = pd.DataFrame(columns=['Group', 'Value'], data=list(zip(a, b)))  # Create a dataframe

>>> df
Out[]:
    Group  Value
0       0     28
1       0     68
2       0     71
3       1     25
4       1     25
5       1     79
6       1     30
7       1     50
8       2     17
9       2      1
10      2     35
11      3     23
12      4     52
13      4     87
14      4     21

Solution:

>>> df.groupby('Group').Value.apply(list).to_dict()
Out[]:
{0: [28, 68, 71],
 1: [25, 25, 79, 30, 50],
 2: [17, 1, 35],
 3: [23],
 4: [52, 87, 21]}

Walkthrough:

  1. create a pd.DataFrame from the input lists, a is called Group and b called Value
  2. df.groupby('Group') creates groups based on a
  3. .Value.apply(list) gets the values for each group and cast it to list
  4. .to_dict() converts the resulting DataFrame to dict

Timing:

To get an idea of timings for a test set of 1,000,000 values in 100,000 groups:

a = sorted(np.random.randint(0, 100000, 1000000).tolist())
b = pd.np.random.randint(0, 100, len(a)).tolist()
df = pd.DataFrame(columns=['Group', 'Value'], data=list(zip(a, b)))

>>> df.shape
Out[]: (1000000, 2)

%timeit df.groupby('Group').Value.apply(list).to_dict()
4.13 s ± 9.29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

But to be honest it is likely less efficient than itertools.groupby suggested by @RomanPerekhrest, or defaultdict suggested by @Ajax1234.

Comments

2

Maybe I miss the point, but at least I will try to help. If you have to lists and want to put them in the dict do the following

a = [1, 2, 3, 4]
b = [5, 6, 7, 8]
lists = [a, b] # or directly -> lists = [ [1, 2, 3, 4], [5, 6, 7, 8] ]
new_dict = {}
for idx, sublist in enumerate([a, b]): # or enumerate(lists)
    new_dict[idx] = sublist

hope it helps

1 Comment

This isn't even close to what OP wants. a contains the keys for the values in b (with some keys being duplicates), and index isn't used at all. Yours just creates { 0: a, 1: b }, using the index in lists.
0

Or do dictionary comprehension beforehand, then since all keys are there with values of empty lists, iterate trough the zip of the two lists, then add the second list's value to the dictionary's key naming first list's value, no need for try-except clause (or if statements), to see if the key exists or not, because of the beforehand dictionary comprehension:

d={k:[] for k in l}
for x,y in zip(l,l2):
   d[x].append(y)

Now:

print(d)

Is:

{0: [24, 53, 88], 1: [32, 45, 24, 88, 53], 9999: [1]}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.