1

I have a table of values stored into a list of lists like:

A = [   [a[1],b[1],c[1]],
        [a[2],b[2],c[2]],
        ...

        [a[m],b[m],c[m]]]

with
a[i] < b[1]
b[i] < a[i+1]
0 < c[i] < 1 

and a numpy array such as:

 X = [x[1], x[2], ..., x[n]]

I need to create an array

 Y = [y[1], y[2], ..., y[n]]

where each value of Y will correspond to

for i in [1,2, ..., n]:
  for k in [1,2, ..., m]:
     if a[k] <  x[i] < b[k]:
         y[i] = c[k]
     else:
         y[i] = 1 

Please note that X and Y have the same length, but A is totally different. Y can take any value in the third column of A (c[k] for k= 1,2,... m), as long as a[k] < x[i] < b[k] is met (for k= 1,2,... m and for i= 1,2,... n).

In the actual cases I am working on, n = 6789 and m = 6172.

I could do the verification using nested "for" cycles, but it is really slow. What is the fastest way to accomplish this? what if X and Y where 2D numpy arrays?

SAMPLE DATA:

a = [10, 20, 30, 40, 50, 60, 70, 80, 90]
b = [11, 21, 31, 41, 51, 61, 71, 81, 91]
c = [ 0.917,  0.572,  0.993 ,  0.131,  0.44, 0.252 ,  0.005,  0.375,  0.341]

A = A = [[d,e,f] for d,e,f in zip(a,b,c)]

X = [1, 4, 10.2, 20.5, 25, 32, 41.3, 50.5, 73]

EXPECTED RESULTS:

Y = [1, 1, 0.993, 0.132, 1, 1, 1, 0.375, 1 ]
11
  • Why would you do zip([1,2, ..., n],[1,2, ..., m])? It seems likely that that doesn't do what you think it does. Commented May 28, 2015 at 18:42
  • @user2357112: you are indeed correct, I have updated the question. thanks. Commented May 28, 2015 at 18:47
  • The new version still looks wrong. Each y[i] value gets overwritten over and over. Commented May 28, 2015 at 18:47
  • 1
    @jorgehumberto Would the posted solution work for you? Commented Jun 3, 2015 at 18:54
  • 1
    @Divakar: Perfect, thanks! took 3.5 s to create the array (when expanded X and Y to a 2D array), instead of the few minutes it would take when I iterated over all elements. Commented Jun 8, 2015 at 19:08

2 Answers 2

1
+50

Approach #1: Using brute-force comparison with broadcasting -

import numpy as np

# Convert to numpy arrays
A_arr = np.array(A)
X_arr = np.array(X)

# Mask that represents "if a[k] <  x[i] < b[k]:" for all i,k
mask = (A_arr[:,None,0]<X_arr) & (X_arr<A_arr[:,None,1])

# Get indices where the mask has 1s, i.e. the conditionals were satisfied
_,C = np.where(mask)

# Setup output numpy array and set values in it from third column of A 
# that has conditionals satisfied for specific indices
Y = np.ones_like(X_arr)
Y[C] = A_arr[C,2]

Approach #2: Based on binning with np.searchsorted -

import numpy as np

# Convert A to 2D numpy array
A_arr = np.asarray(A)

# Setup intervals for binning later on 
intv = A_arr[:,:2].ravel()

# Perform binning & get interval & grouped indices for each X 
intv_idx = np.searchsorted(intv, X, side='right')
grp_intv_idx = np.floor(intv_idx/2).astype(int)

# Get mask of valid indices, i.e. X elements are within grouped intervals
mask = np.fmod(intv_idx,2)==1

# Setup output array 
Y = np.ones(len(X))

# Extract col-3 elements with grouped indices and valid ones from mask
Y[mask] = A_arr[:,2][grp_intv_idx[mask]]

# Remove (set to 1's) elements that fall exactly on bin boundaries
Y[np.in1d(X,intv)] = 1

Please note that if you need the output as a list, you can convert the numpy array to a list with a call like this - Y.tolist().


Sample run -

In [480]: A
Out[480]: 
[[139.0, 355.0, 0.5047342078960846],
 [419.0, 476.0, 0.3593886192040009],
 [580.0, 733.0, 0.3137694021600973]]

In [481]: X
Out[481]: [555, 689, 387, 617, 151, 149, 452]

In [482]: Y
Out[482]: 
array([ 1.        ,  0.3137694 ,  1.        ,  0.3137694 ,  0.50473421,
        0.50473421,  0.35938862])
Sign up to request clarification or add additional context in comments.

Comments

0

With 1-d arrays, it's not too bad:

a,b,c = np.array(A).T
mask = (a<x) & (x<b)
y = np.ones_like(x)
y[mask] = c[mask]

If x and y are higher-dimensional, then your A matrix will also need to be bigger. The basic concept works the same, though.

1 Comment

I am sorry, you made me realize that explanation was incorrect. A has length "m", but X and Y have length "n". Y can take any value in the third "column" of A. I have updated the question to make it clearer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.