2

I have a dictionary with only 4 keys (mydictionary) and a list (mynodes) as follows.

    mydictionary = {0: {('B', 'E', 'G'), ('A', 'E', 'G'), ('A', 'E', 'F'), ('A', 'D', 'F'), ('C', 'D', 'F'), ('C', 'E', 'F'), ('A', 'D', 'G'), ('C', 'D', 'G'), ('C', 'E', 'G'), ('B', 'E', 'F')}, 
1: {('A', 'C', 'G'), ('E', 'F', 'G'), ('D', 'E', 'F'), ('A', 'F', 'G'), ('A', 'B', 'G'), ('B', 'D', 'F'), ('C', 'F', 'G'), ('A', 'C', 'E'), ('D', 'E', 'G'), ('B', 'F', 'G'), ('B', 'C', 'G'), ('A', 'C', 'D'), ('A', 'B', 'F'), ('B', 'D', 'G'), ('B', 'C', 'F'), ('A', 'D', 'E'), ('C', 'D', 'E'), ('A', 'C', 'F'), ('A', 'B', 'E'), ('B', 'C', 'E'), ('D', 'F', 'G')}, 
2: {('B', 'D', 'E'), ('A', 'B', 'D'), ('B', 'C', 'D')}, 
3: {('A', 'B', 'C')}}

mynodes = ['E', 'D', 'G', 'F', 'B', 'A', 'C']

I am checking how many times each node in mynodes list is in each key of mydictionary. For example, consider the above dictionary and list.

The output should be;

{'E': [(0, 6), (1, 8), (2, 1), (3, 0)], 
'D': [(0, 4), (1, 8), (2, 3), (3, 0)], 
'G': [(0, 5), (1, 10), (2, 0), (3, 0)], 
'F': [(0, 5), (1, 10), (2, 0), (3, 0)], 
'B': [(0, 2), (1, 9), (2, 3), (3, 1)], 
'A': [(0, 4), (1, 9), (2, 1), (3, 1)], 
'C': [(0, 4), (1, 9), (2, 1), (3, 1)]}

For example, consider E. It appears 6 times in 0 key, 8 times in 1 key, 2 times in 2 key and 0 times in 3 key.

My current code is as follows.

    triad_class_for_nodes = {}

    
    for node in mynodes:
        temp_list = []
                
        for key, value in mydictionary.items():                
            temp_counting = 0
            
            for triad in value:
                #print(triad[0])
                if node in triad:
                    temp_counting = temp_counting + 1
            temp_list.append(tuple((key, temp_counting)))
    
        triad_class_for_nodes.update({node: temp_list})
    print(triad_class_for_nodes)

This works fine with the small dictionary values.

However, in my real dataset, I have millions of tuples in the value list for each of my 4 keys in my dictionary. Hence, my existing code is really inefficient and takes days to run.

When I search on how to make this more efficient I came accross this question (Fastest way to search a list in python), which suggests to make the list of values to a set. I tried this as well. However, it also takes days to run.

I am just wondering if there is a more efficient way of doing this in python. I am happy to transform my existing data formats into different structures (such as pandas dataframe) to make things more efficient.

A small sample of mydictionary and mynodes is attached below for testing purposes. https://drive.google.com/drive/folders/15Faa78xlNAYLPvqS3cKM1v8bV1HQzW2W?usp=sharing

  • mydictionary: see triads.txt

    with open("triads.txt", "r") as file: mydictionary = ast.literal_eval(file.read)

mynodes: see nodes.txt

with open("nodes.txt", "r") as file:  
   mynodes = ast.literal_eval(file.read) 

I am happy to provide more details if needed.

2 Answers 2

1

Since you tag pandas, first we need convert your dict to pandas dataframe , then we stack it , and using crosstab

s=pd.DataFrame.from_dict(mydictionary,'index').stack()


s = pd.DataFrame(s.values.tolist(), index=s.index).stack()
pd.crosstab(s.index.get_level_values(0),s)
col_0  A  B  C  D  E   F   G
row_0                       
0      4  2  4  4  6   5   5
1      9  9  9  8  8  10  10
2      1  3  1  3  1   0   0
3      1  1  1  0  0   0   0

Update

s=pd.crosstab(s.index.get_level_values(0), s).stack().reset_index()

s[['row_0',0]].apply(tuple,1).groupby(s['col_0']).agg(list).to_dict()
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for the answer. Do you think it is more efficient than my existing solution?
@Emi in term of efficient I think it will depend on your data size , but pd.crosstab the data have better look than the dict of list of tuple
Thanks. I will run your code for my actual dataset and will let you know how it performed :)
Just wondering how I can transform your final output into something like this: {'E': [(0, 6), (1, 8), (2, 1), (3, 0)], 'D': [(0, 4), (1, 8), (2, 3), (3, 0)], 'G': [(0, 5), (1, 10), (2, 0), (3, 0)], 'F': [(0, 5), (1, 10), (2, 0), (3, 0)], 'B': [(0, 2), (1, 9), (2, 3), (3, 1)], 'A': [(0, 4), (1, 9), (2, 1), (3, 1)], 'C': [(0, 4), (1, 9), (2, 1), (3, 1)]} Please let me know your thoughts :)
I did not get a lot of performance gain through this solution. However, it is a clean and nice code and I like it compared to my code :)
1

If you're not using pandas, you could do this with Counter from collections:

from collections import Counter,defaultdict
from itertools import product
counts = Counter((c,k) for k,v in mydictionary.items() for t in v for c in t )
result = defaultdict(list)
for c,k in product(mynodes,mydictionary):
    result[c].append((k,counts[(c,k)]))

print(result)
{'E': [(0, 6), (1, 8), (2, 1), (3, 0)],
 'D': [(0, 4), (1, 8), (2, 3), (3, 0)],
 'G': [(0, 5), (1, 10), (2, 0), (3, 0)],
 'F': [(0, 5), (1, 10), (2, 0), (3, 0)],
 'B': [(0, 2), (1, 9), (2, 3), (3, 1)],
 'A': [(0, 4), (1, 9), (2, 1), (3, 1)],
 'C': [(0, 4), (1, 9), (2, 1), (3, 1)]}

Counter will manage counting instances for each combination of mydictionary key and node. You can then use these counts to create the expected output.

EDIT Expanded counts line:

counts = Counter()                          # initialize Counter() object
for key,tupleSet in mydictionary.items():   # loop through dictionary
    for tupl in tupleSet:                   # loop through tuple set of each key
        for node in tupl:                   # loop through node character in each tuple
            counts[(node,key]] += 1         # count 1 node/key pair

2 Comments

Hi, thanks a lot for the answer. Do you think that this is more efficient than my code? :) I will also test it in my dataset and let you know.
It would be great appreciated if you could help me to expand this line counts = Counter((c,k) for k,v in mydictionary.items() for t in v for c in t ). `

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.