4

I have an array like:

array = ['A0','A1','A2','A3','A4','B0','B1','C0']

and want to obtain an array which is true for values with an A followed by a number ranging from 0 to 2.

So far, this is the way I do it:

selection = np.where ((array == 'A0') | (array == 'A1') | (array == 'A2'), 1, 0)

But is there a more elegant way to do this by using e.g., a regular expresion like:

selection = np.where (array == 'A[0-1]', 1, 0)

6 Answers 6

7

If using pandas is an option:

import numpy as np
import pandas as pd

a = np.array(['A0','A1','A2','A3','A4','B0','B1','C0'])
pd.Series(a).str.match(r'A[0-2]')
# 0     True
# 1     True
# 2     True
# 3    False
# 4    False
# 5    False
# 6    False
# 7    False
# dtype: bool
Sign up to request clarification or add additional context in comments.

1 Comment

That was my first guess, but I need a purely numpy approach. Thank you.
1

I don't think numpy if your best solution here. You can accomplish this using built-in python tools such as map.

import re

array = ['A0','A1','A2','A3','A4','B0','B1','C0']
p = r'A[0-2]'

list(map(lambda x: bool(re.match(p, x)), array))
# returns
[True, True, True, False, False, False, False, False]

# to get an array:
np.array(list(map(lambda x: bool(re.match(p, x)), array)))
# returns:
array([ True,  True,  True, False, False, False, False, False])

1 Comment

You can pass it through the numpy array constructor. See the update.
1

If it's not more complicated than A0, A1 and A2, you can use

a = np.array(['A0','A1','A2','A3','A4','B0','B1','C0'])
np.in1d(a, ['A0', 'A1', 'A2'])
# array([ True,  True,  True, False, False, False, False, False])

1 Comment

It is more complicated than just that. I tried to post a simple example... but I need to set other conditions involving other numpy arrays.
0

Try vectorization with blend of re :

import re
array = ['A0','A1','A2','A3','A4','B0','B1','C0']

y = np.vectorize(lambda y, x: bool(re.compile(x).match(y)))
selection = np.where(y(array, 'A[0-2]'), 1, 0)
print(selection)

#output:
[1 1 1 0 0 0 0 0]

4 Comments

You don't need to use a ternary expression, you can just use bool(re.compile(x).match(y))
Also, np.vectorize() is only for convenience, it doesn't provide any speedup over a for loop.
@NilsWerner I used it for its literal use, i.e. to vectorize a function.
@Vicrobot The same variable name and lambda argument name y confused me
0

You can also use list comprehension:

r = re.compile('A[0-2]')
selection = np.array([1 if re.match(r, i) else 0 for i in array])

Comments

0

You dont even need the lambda to vectorize a re.match():

array = ['A0','A1','A2','A3','A4','B0','B1','C0']
selection = np.vectorize(re.match,excluded={0})(r'A[0-1]', array)!= None)

its not any faster than the for loop, but its very simply. "excluded={0}" tells np.vectorize not to vectorize the first argument, the regex string.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.