1

I have two NumPy arrays:

one named labels and the other component:

labels = array([ 0,  0,  0,  3,  0,  0,  0,  1,  0,  1,  0,  0,  2,  2,  3,  4, -1])

component = array([ 1.05325312,  1.0206622 ,  1.0372349 ,  1.06951778,  0.96379751, 0.98862576,  1.01135931,  0.92633951,  1.09095756,  1.1662432, 1.17794883,  1.23006966,  1.25465147,  1.27054648,  1.18940802, 0.91512676,  0.81926385])

I want the labels array to be the keys of the dictionary, and the elements in components to be sorted into the keys.

Both arrays are the same shape and the positions of the elements in labels correspond to the position in components.

I'm trying to get something like this:

{'-1': [0.81926385],
'0': [1.05325312, 1.0206622, 1.0372349, 0.96379751, 0.98862576, 1.01135931, 1.09095756], 
'1': [0.92633951, 1.1662432], 
'2': [1.25465147,  1.27054648], 
'3': [1.06951778, 1.18940802], 
'4': [0.91512676]}

I have tried using zip with several different methods but I can't figure out how to split the values into their associated key. Can anyone point me in the right direction?

d = dict(zip(labels, components))

3 Answers 3

3

You can't use dict(zip(**)) directly, don't forget that the keys in the dictionary are unique, adding a judgment may solve the problem, the way I provide is to do it by a loop combined with an if statement, if the key exists then append, if not then create an empty list:

from numpy import array

labels = array([ 0,  0,  0,  3,  0,  0,  0,  1,  0,  1,  0,  0,  2,  2,  3,  4, -1])

component = array([ 1.05325312,  1.0206622 ,  1.0372349 ,  1.06951778,  0.96379751, 0.98862576,  1.01135931,  0.92633951,  1.09095756,  1.1662432, 1.17794883,  1.23006966,  1.25465147,  1.27054648,  1.18940802, 0.91512676,  0.81926385])

dic = {}
for key, value in zip(labels, component):
    if key not in dic:
        dic[key] = [value]
    else:
        dic[key].append(value)
print(dic)

If you want to be more concise, you can consider using defaultdict as well as OrderedDict

Sign up to request clarification or add additional context in comments.

Comments

1

This is a perfect use case for collections.defaultdict:

from collections import defaultdict

d = defaultdict(list)

for l,c in zip(labels, component):
    d[l].append(c)
    
d = dict(d)

output:

{0: [1.05325312, 1.0206622, 1.0372349, 0.96379751, 0.98862576, 1.01135931, 1.09095756, 1.17794883, 1.23006966],
 3: [1.06951778, 1.18940802],
 1: [0.92633951, 1.1662432],
 2: [1.25465147, 1.27054648],
 4: [0.91512676],
-1: [0.81926385]}

Comments

0

Another alternative I have thought of is to use pandas.

Put both arrays into their own columns and group them by labels.

I was hoping to keep things in numpy for speed but if needs must.

pd.DataFrame(data=[X.flatten(), labels], index={'Levels', 'Zone'}).T

2 Comments

did you try itertools?
@adirabargil its a massive dataset, would be best to avoid for loops where possible

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.