3

New to Python, and have been learning about arrays. I am stuck with a simple enough problem and need a solution. I have two arrays:

a = [2.0, 5.1, 6.2, 7.9, 23.0]     # always increasing
b = [5.1, 5.5, 5.7, 6.2, 00.0]     # also always increasing

and I want the resultant array to be:

c = [0.0, 5.1, 6.2, 0.0, 0.0]      # 5.5, 5.7, 00.0 from 'b' were dropped and rearranged such that position of equivalent elements as in 'a' are maintained

I have compared both 'a' & 'b' using Numpy as in:

y = np.isclose(a, b)
print y
# [False False False False False]

(Alternately,) I also tried something like this, which isn't the right way (I think):

c = np.zeros(len(a))
for i in range (len(a)):
    for j in range (len(a)):
        err = abs(a[i]-b[j])
        if err == 0.0 or err < abs(1):
            print (err, a[i], b[j], i, j)
        else:
            print (err, a[i], b[j], i, j)

How do I proceed from here towards obtaining 'c'?

7
  • 1
    Try y = np.isclose(a, b, atol=0.05). Commented Feb 28, 2016 at 12:40
  • It isn't affecting the result. Putting atol=0.5 gives [False True True False False] which is c bool-wise. Commented Feb 28, 2016 at 12:47
  • Do I simply copy element values from a at position of True values? Or is there a better way of doing it? Commented Feb 28, 2016 at 12:49
  • Do equivalent values have to be at equal indices? Or would a=[5,6,7]; b=[0,0,5] give c=[5,0,0]? (Your comment in your 2nd code snippet is not clear to me.) Commented Feb 28, 2016 at 13:22
  • Yes equal values at equal indices, but since the values are in ascending order (always increasing), sorting will be easier. 2nd snippet is an alternate way. Commented Feb 28, 2016 at 13:30

5 Answers 5

4

These solutions work even when the arrays are of different size.

Simple version

c = []

for i in a:
    if any(np.isclose(i, b)):
        c.append(i)
    else:
        c.append(0.0)

Numpy version

aa = np.tile(a, (len(b), 1))
bb = np.tile(b, (len(a), 1))
cc = np.isclose(aa, bb.T)
np.any(cc, 0)
c = np.zeros(shape=a.shape)
result = np.where(np.any(cc, 0), a, c)

Explained:

I will be doing matrix comparison here. First you expand the arrays into matrices. Lengths are exchanged, which creates matrices having equal size of one dimension:

aa = np.tile(a, (len(b), 1))
bb = np.tile(b, (len(a), 1))

They look like this:

# aa
array([[  2. ,   5.1,   6.2,   7.9,  23. ],
       [  2. ,   5.1,   6.2,   7.9,  23. ],
       [  2. ,   5.1,   6.2,   7.9,  23. ],
       [  2. ,   5.1,   6.2,   7.9,  23. ],
       [  2. ,   5.1,   6.2,   7.9,  23. ]])

# bb
array([[ 5.1,  5.5,  5.7,  6.2,  0. ],
       [ 5.1,  5.5,  5.7,  6.2,  0. ],
       [ 5.1,  5.5,  5.7,  6.2,  0. ],
       [ 5.1,  5.5,  5.7,  6.2,  0. ],
       [ 5.1,  5.5,  5.7,  6.2,  0. ]])

Then compare them. Note that bb is transposed:

cc = np.isclose(aa, bb.T)

And you get:

array([[False,  True, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False, False,  True, False, False],
       [False, False, False, False, False]], dtype=bool)

You can aggregate this by axis 0:

np.any(cc, 0)

which returns

array([False,  True,  True, False, False], dtype=bool)

Now create array c:

c = np.zeros(shape=a.shape)

And select appropriate value, either from a or c:

np.where(np.any(cc, 0), a, c)

And the result:

array([ 0. ,  5.1,  6.2,  0. ,  0. ])
Sign up to request clarification or add additional context in comments.

4 Comments

I think this is what the OP wants. But he added a comment now saying that there may be trailing junk values at the end of the b array which must be ignored. So one would need to trim that array before using it, or adapt the algorithm. (Applies to all current answers, I think.)
Yes, I've added a comment about the assumption. The remainder can be np.appended if necessary.
Very thorough explanation I must say. Only thing is, if one used numpy to generate an empty matrix of zeroes (say) c, won't np.append throw errors, instead of using c[i] = b[j].
I've edited a fix for different array sizes (just exchanged len(a) and len(b)). Basically - all that you now have to do is decide where bad data starts and remove it so it will not interfere with comparisons.
1

With np.isclose you already create an array where the "closest" elements are True. So you can use this result to set all other elements to zero.

import numpy as np
a = np.array([2.0, 5.1, 6.2, 7.9, 23.0])     # always increasing
b = np.array([5.1, 5.5, 5.7, 6.2, 00.0])     # also always increasing
a[~np.isclose(a,b, atol=0.5)] = 0
a

this returns array([ 0. , 5.1, 6.2, 0. , 0. ]).

But notice you want to set all elements that are not close, so you need to invert (~) the result.

Comments

1

Try to better explain what your program should do in a more general way. Only giving arrays a, b and c does not tell what it should do. It is as if someone said "If A=5 and B=7, write a program so that C=20".

From what you tried, I'd guess that the task is "each element of c should be equal to the corresponding element of a if its value is near (difference of 0.5 or less) to the corresponding value in b. It should be zero if not."

Also, do you really need to use numpy? Try using only loops and list methods. You may also have a look at "Generator expressions and list comprehensions"

Finally, your title says "(...) and modifying 2nd array". There should not be a third array named c. The result should appear in a modified version of array b.


Edited: if the specification was really this, then the code could be

a = [2.0, 5.1, 6.2, 7.9, 23.0]
b = [5.1, 5.5, 5.7, 6.2, 0.0]
c = []
for x,y in zip(a,b): c.append( x if abs(x-y)<=0.5 else 0.0 )
print c

Which gives the following answer

[0.0, 5.1, 6.2, 0.0, 0.0]

BTW, if this is for a course, you could still get a bad grade for not following the specification ("...and modifying the 2nd array").

5 Comments

You guessed the task right. I tried alternately using loops instead of Numpy (see 2nd snippet). And the modifying 2nd array is what I need to do, but I would settle for third array c as it helps in keeping things simple for the time being.
Hint: if you are beginning in python, you should stick to the base language (i.e. "Language reference" and "Library reference" in the python documentation). Don't try using external libraries (e.g. Numpy) yet.
It isn't for a course btw, I am a data science enthusiast and encountered this problem while working with different data sets. The arrays actually represent columns of a much larger data set.
You could also modify b directly using array indices: for i in range(len(a)): b[i] = a[i] if abs(a[i]-b[i])<=0.5 else 0.0
I didn't really think it was for a course: people who do that usually paste the homework assignment word for word. I wrote that to emphasize the fact that the first step when writing a program (whatever language you program in) starts by having a clear idea of the task to accomplish.
1

It seems that you want to keep elements of a that are also in b.

A pure linear time python solution :

c=zeros_like(a)

j=0
n=len(c)
for i in range(n):
    while j<n and b[j]<a[i]-.1 : j+=1
    if j==n : break
    if abs(a[i]-b[j])<.1 : c[i]=a[i]

And a numpy solution for exact matching:

a*in1d(a,b).

in1d(a,b) indicates the places of elements of a that are in b : in1d(a,b) is [False, True, True, False, False]

Since True is 1 and False is 0 , a*in1d(a,b) is [ 0., 5.1, 6.2, 0. , 0. ] . Since in1d sorts a and b, it is a n ln n complexity algorithm, but generally faster. if approximative equality is required, a solution can be rounding the arrays first (np.round(a,1))

2 Comments

I arrived with a solution pretty similar to yours using nested for loops only
Yes, But it's a quadratic algorithm, it don't use the fact that arrays are sorted. It will be inefficient on big arrays.
0

This is the alternate way I was able to obtain the required arrangement for c.

import numpy as np

a = [2.0, 5.1, 6.2, 7.9, 23.0]  # always increasing
b = [5.1, 5.5, 5.7, 6.2, 00.0]  # also always increasing
c = np.zeros(len(a))

for i in range (len(a)):
    for j in range (len(a)):
        err = abs(a[i]-b[j])
        if err == 0.0 or err < abs(0.1):
            c[i] = b[j]

print c
#[ 0.   5.1  6.2  0.   0. ]

3 Comments

Thanks. Now your aim is more clear. I give more efficient solutions in my post.
In the ifstatement, abs(0.1) could be written without the abs function (abs(0.1) is 0.1). Also if err == 0.0, that also implies that err < 0.1, so the first condition is redundant. You could just write if err < 0.1:.
abs(0.1) is redundant in this case but when there are close enough values, err might be negative in some cases. So err == 0 and err < abs(0.1) would be required. I tried using different set of values just for testing the code and this statement worked like a charm.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.