1

I have an array as follows:

strArray = np.array(['ab','abc','ab','bca','ab','m-2','bca'])

For the example, this is a short array with short strings, but consider that the strings and the array are actually much longer with many repetitions and taking up too much space.

Is there a function which takes this array and outputs two arrays, one is a dictionary of unique strings and one is the strArray but with an integer identifier:

keyArray, intArray = some_function(strArray)
print(keyArray) # output: { 0:'ab', 1:'abc', 2:'bca', 3:'m-2' }
print(intArray) # output: [ 0, 1, 0, 2, 0, 3, 2 ]

In the alternative, I will settle for just intArray just so that I have a reduced size array with which I can work more easily - the original string would be useful, but not at the sacrifice of size/speed/ease.

1 Answer 1

5

We can use np.unique with return_inverse arg -

In [16]: unq,tags = np.unique(strArray, return_inverse=True)

In [17]: dict(zip(range(len(unq)),unq))
Out[17]: {0: 'ab', 1: 'abc', 2: 'bca', 3: 'm-2'}

In [18]: tags
Out[18]: array([0, 1, 0, 2, 0, 3, 2])
Sign up to request clarification or add additional context in comments.

1 Comment

That's perfect. Thank you

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.