I have two dictionaries. One has chapter_id and book_id: {99: 7358, 852: 7358, 456: 7358}. Here just one book as an example, but there are many. And another one the same chapter_id and some information: {99: [John Smith, 20, 5], 852: [Clair White, 15, 10], 456: [Daniel Dylan, 25, 10]}. Chapter ids are unique through all the books. And I have to combine it in the way that every book gets information from all the chapters it contains. Something like {7358:[[99,852,456],[John Smith, Claire White, Daniel Dylan],[20,15,25],[5,10,10]]}. I also have a file already with a dictionary, where each book has ids of all chapters it has. I know how to do it by looping over both dictionaries (they used to be lists). But it takes ages. That is why they are now dictionaries and I think I can manage with just one loop over all chapters. But in my head I always come back to the looping over books and over chapters. Any ideas are very much appreciated! The final result I will write in the file, so it is not very important if it is a nested dictionary or something else. Or at least I think so.
-
Try zipping the dictionaries together, then loop over the result. Probably still expensive, but worth a try. Actually, it might act lazily via a generator, so it could actually be quite cheap.Carcigenicate– Carcigenicate2016-11-08 14:45:19 +00:00Commented Nov 8, 2016 at 14:45
-
1Your first dict is a list of dicts: is that a typo?brianpck– brianpck2016-11-08 14:46:07 +00:00Commented Nov 8, 2016 at 14:46
-
@brianpck yes, sorrystudent– student2016-11-08 14:49:33 +00:00Commented Nov 8, 2016 at 14:49
Add a comment
|
3 Answers
If you are open to using other packages then you might want to have a look on pandas, which will allow you to do many things easily and fast. Here is an example based on the data you provided...
import pandas as pd
d1 = {99: 7358, 852: 7358, 456: 7358}
df1 = pd.DataFrame.from_dict(d1, "index")
df1.reset_index(inplace=True)
d2 = {99: ["John Smith", 20, 5], 852: ["Clair White", 15, 10], 456: ["Daniel Dylan", 25, 10]}
df2 = pd.DataFrame.from_dict(d2, "index")
df2.reset_index(inplace=True)
df = df1.merge(df2, left_on="index", right_on="index")
df.columns = ["a", "b", "c", "d", "e"]
# all data for 7358 (ie subsetting)
df[df.b == 7358]
# all names as a list
list(df[df.b == 7358].c)
2 Comments
student
That is a great help! 45 seconds instead of 20 hours. I am shocked :) But I am not sure why the reset_index step is necessary
John Smith
Glad it worked :), used the reset_index to make it from row index to a column so can do the merge later. Might be possible to merge on row names but didn't spend much time on the docs to refresh my memory on how to do this :). For further check out pandas.pydata.org/pandas-docs/stable/generated/… and pandas.pydata.org/pandas-docs/stable/merging.html
from collections import defaultdict
def append_all(l, a):
if len(l) != len(a):
raise ValueError
for i in range(len(l)):
l[i].append(a[i])
final_dict = defaultdict(lambda: [[],[],[],[]])
for chapter, book in d1.items():
final_dict[book][0].append(chapter)
append_all(final_dict[book][1:], d2[chapter])
You only need to iterate over the chapters. You can replace the append_all function with explicit appends, but it seemed ugly to do it that way. I'm surprised there's not a method for this, but it may just be that I missed a clever way to use zip here.