2

1 term_map tracks which term is in which position.

In [256]: term_map = np.array([2, 2, 3, 4, 4, 4, 2, 0, 0, 0])

In [257]: term_map
Out[257]: array([2, 2, 3, 4, 4, 4, 2, 0, 0, 0])

2 term_scores tracks the weight of each term at each position.

In [258]: term_scores = np.array([5, 6, 9, 8, 9, 4, 5, 1, 2, 1])

In [259]: term_scores
Out[259]: array([5, 6, 9, 8, 9, 4, 5, 1, 2, 1])

3 Get the unique values and the inverse indices.

In [260]: unqID, idx = np.unique(term_map, return_inverse=True)

In [261]: unqID
Out[261]: array([0, 2, 3, 4])

4 Compute the scores for the unique values.

In [262]: value_sums = np.bincount(idx, term_scores)

In [263]: value_sums
Out[263]: array([  4.,  16.,   9.,  21.])

5 Initialize Array To Update. The indices correspond to the values in the term_map variable.

In [254]: vocab = np.zeros(13)

In [255]: vocab
Out[255]: array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

6 DESIRED: Insert the values 4 corresponding to the positions listed in 3 into the vocab variable.

In [255]: updated_vocab
Out[255]: array([ 4.,  0.,  16.,  9.,  21.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

How do I create 6?

10
  • So, the question is? Commented Feb 16, 2017 at 19:16
  • 1
    @Divakar He's unable to do stage 6 I guess. Commented Feb 16, 2017 at 19:17
  • yeah @TonyTannous that's correct. Sorry for not making it more clear. Commented Feb 16, 2017 at 19:18
  • I would think that's trivial given the 5 more difficult hurdles covered before that. Commented Feb 16, 2017 at 19:18
  • I'd like to not use a for loop if possible. This is pretty close - stackoverflow.com/questions/8373079/… Commented Feb 16, 2017 at 19:19

2 Answers 2

3

As it turns out, we can avoid the np.unique step to directly get to the desired output by feeding in term_map and term_scores to np.bincount and also mention the length of the output array with its optional argument minlength.

Thus, we could simply do -

final_output = np.bincount(term_map, term_scores, minlength=13)

Sample run -

In [142]: term_map = np.array([2, 2, 3, 4, 4, 4, 2, 0, 0, 0])
     ...: term_scores = np.array([5, 6, 9, 8, 9, 4, 5, 1, 2, 1])
     ...: 

In [143]: np.bincount(term_map, term_scores, minlength=13)
Out[143]: 
array([  4.,   0.,  16.,   9.,  21.,   0.,   0.,   0.,   0.,   0.,   0.,
         0.,   0.])
Sign up to request clarification or add additional context in comments.

Comments

2
import numpy as np

term_map = np.array([2, 2, 3, 4, 4, 4, 2, 0, 0, 0])
term_scores = np.array([5, 6, 9, 8, 9, 4, 5, 1, 2, 1])
unqID, idx = np.unique(term_map, return_inverse=True)
value_sums = np.bincount(idx, term_scores)

vocab = np.zeros(13)
vocab[unqID] = value_sums
print(vocab)

OUT: [ 4. 0. 16. 9. 21. 0. 0. 0. 0. 0. 0. 0. 0.]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.