1

Suppose I have two arrays:

a = np.array(
[[0, 1],
 [2, 3],
 [4, 5],
 [6, 7]])

b = np.array(
[[2, 3],
 [6, 7],
 [0, 1],
 [4, 5]])

As you can see, one array is simply a shuffle of the other. I need to combine these two arrays to form a third array, c, such as:

  • the first part of array c (until a random index i) consists of elements from the first part of array a (until index i). Therefore, c[:i] == a[:i] must return True.
  • the rest of the array c is filled by values from array b, that are not already inside array c, in the exact same order they appear in.

Given that index i is set to 2, the desired output for above arrays a and b in the code should be:

> c
[[0, 1],
 [2, 3],
 [6, 7],
 [4, 5]]

Array c must be of the same length as both array b and array a, and there is a possibility that two elements within either array a or array b are the same. Array c must also consist of the same elements that are in a and b, (i.e. it behaves somewhat like a shuffle).

I've tried multiple solutions, but none give the desired result. The closest was this:

a = np.arange(10).reshape(5, 2)
np.random.shuffle(a)

b = np.arange(10).reshape(5, 2)
b_part = b[:4]

temp = []

for part in a:
    if part in b_part:
        continue
    else:
        temp.append(part)

temp = np.array(temp)

c = copy.deepcopy(np.vstack((b_part, temp)))

However, it sometimes results in array c being smaller than arrays a and b, because the elements in either list can sometimes repeat.

10
  • 1
    If elements are not unique your rules imply that c can be shorter: Example a = [(0,1),(2,3),(2,3),(4,5)] b=[(2,3),(4,5),(2,3),(0,1)] i=2 So you'd pick a[:i] which is [(0,1),(2,3)] and from b what has not occurred yet which is (4,5). This c would be [(0,1),(2,3),(4,5)] which is shorter. Commented Mar 29, 2019 at 21:39
  • @PaulPanzer I understand that is what's causing the issue, but I don't know how I can address it myself (which is why I'm asking the question) Commented Mar 29, 2019 at 21:43
  • The first thing would be to decide what your desired answer would be in this case. Commented Mar 29, 2019 at 21:46
  • imho your code is working though (just create a_part instead of b_part and reverse arrays in both loops+ vstack(a_part, temp)) but your issue with size(c) is your problem definition if you have duplicates inside a which is basis for c. Imagine that you split a with index i as resulting c in the way that you take just one duplicate of values inside a so your c cannot have the same size as a or b because you cannot add other duplicate from b Commented Mar 29, 2019 at 22:07
  • 1
    @KomronAripov using your pic example, try index [:4] instead of [:2] and give us result. Commented Mar 29, 2019 at 22:37

4 Answers 4

2

The following should handle duplicates alright.

def mix(a, b, i):                                             
    sa, sb = map(np.lexsort, (a.T, b.T))                      
    mb = np.empty(len(a), '?')                                
    mb[sb] = np.arange(2, dtype='?').repeat((i, len(a)-i))[sa]
    return np.concatenate([a[:i], b[mb]], 0)                             

It

  • indirectly sorts a and b
  • creates a mask which is True at the positions not taken from a, i.e. has i Falses and then len(a)-i Trues.
  • uses the sort orders to map that mask to b
  • filters b with the mask and appends to a[:i]

Example (transposed to save space):

a.T
# array([[2, 2, 0, 2, 3, 0, 2, 0, 0, 1],
#        [0, 1, 2, 0, 1, 0, 3, 0, 0, 0]])
b.T
# array([[0, 0, 2, 1, 0, 0, 2, 2, 2, 3],
#        [0, 0, 0, 0, 2, 0, 1, 3, 0, 1]])
mix(a, b, 6).T
# array([[2, 2, 0, 2, 3, 0, 0, 1, 0, 2],
#        [0, 1, 2, 0, 1, 0, 0, 0, 0, 3]])
Sign up to request clarification or add additional context in comments.

3 Comments

If a = np.array( [[0, 1], [2, 3], [0, 0], [2, 3], [4, 5], [6, 7]]), b = np.array( [[2, 3], [6, 7], [0, 1], [4, 5], [2, 3], [0, 0]]), then for i=3, this solution gives c = [0, 1], [2, 3], [0, 0], [6, 7], [4, 5], [2, 3]]. The last [2,3] probably violates OP's requirement, since it is already picked up from a, and must not be picked up again from b.
@fountainhead No, that's the other [2, 3] ;-] -- More seriously, OP's clarifications as to how to handle dupes are spread out across several comments.
Several comments and several external links too. :-)
2

Here's one solution:

full_len = len(a)

b_not_in_a_part = ~np.all(np.isin(b,a[:i+1]),axis=1)         # Get boolean mask, to apply on b
b_part_len = full_len-i-1                                    # Length of b part of c

c = np.concatenate((a[:i+1], b[b_not_in_a_part,:]), axis=0)  # Contruct c, using the mask for the b part.

Testing it out:

import numpy as np
a = np.array(
[[0, 1],
 [2, 3],
 [0, 0],
 [2, 3],
 [4, 5],
 [6, 7]])
b = np.array(
[[2, 3],
 [6, 7],
 [0, 1],
 [4, 5],
 [2, 3],
 [0, 0]])

i = 2

print ("a is:\n", a)
print ("b is:\n", b)

full_len = len(a)

b_not_in_a_part = ~np.all(np.isin(b,a[:i+1]),axis=1)         # Get boolean mask, to apply on b
b_part_len = full_len-i-1                                    # Length of b part of c

c = np.concatenate((a[:i+1], b[b_not_in_a_part,:]), axis=0)  # Contruct c, using the mask for the b part.
print ("c is:\n", c)

Output:

a is:
 [[0 1]
 [2 3]
 [0 0]
 [2 3]
 [4 5]
 [6 7]]
b is:
 [[2 3]
 [6 7]
 [0 1]
 [4 5]
 [2 3]
 [0 0]]
c is:
 [[0 1]
 [2 3]
 [0 0]
 [6 7]
 [4 5]]

Note: For this example, c has a length of only 5, even though a and b have a length of 6. This is because, due to high duplication in b, there aren't enough values left in b, that are eligible to be used for c.

3 Comments

@KomronAripov: This error is occurring because, due to high duplication in b, there aren't enough values in b that are not already in used from a. I can fix it if you tell me what is your requirement for this scenario.
In such a scenario, is it ok if c has smaller length than a or b? (Since there are not enough values to match the full length)
@KomronAripov, Fixed the error, making the assumption that it's ok for c to have a smaller length, if high duplication in b leaves us with not enough values eligible to go into c
0

Just use numpy.concatenate() and ensure that your index is itself plus 1 (as numpy indexing goes up to but not inclusive of said index value, see below): (Edit: seems like you modified your a, b and c arrays, so I 'll change my code below to accomodate)

import numpy as np

a = np.array(
[[0, 1],
 [2, 3],
 [4, 5],
 [6, 7]])

b = np.array(
[[2, 3],
 [6, 7],
 [0, 1],
 [4, 5]])


i = 2
c = a[0:i]
for k in b:
    if k not in c:
        c = np.concatenate((c, [k]))

print(c)

Output:

[[0 1]
 [2 3]
 [6 7]
 [4 5]]

5 Comments

after a slight modification to array b (now updated in the answer), the output is not as expected.
this results in duplicates (even though there were none to begin with), which is undesired...
Ok I get ya, how about now?
this still does not address the issue where there is a possibility that either array a or array b can contain duplicates.
can you provide an example of what c should look like if a and b contains the duplicates as you have mentioned?
0
  1. For i=2, get your first part of the result:

    c = a[i:]
    
  2. Get "uncommon" elements between b and c:

    diff = np.array([x for x in b if x not in c])
    
  3. Select random elements from diff and concatenate to the original array:

    s = len(a) - i
    np.concatenate([c, diff[np.random.choice(diff.shape[0], size=s, replace=False), :]], axis=0)
    

OUTPUT:

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])

6 Comments

You can add replace=False when you select random rows from b. Updated the code.
TypeError: randint() got an unexpected keyword argument 'replace'
@KomronAripov Sorry, my bad. You are also supposed to change randint to choice. Try now?
ValueError: Cannot take a larger sample than population when 'replace=False'
This is probably your data issue where you are trying to select a larger number of elements than your parent set. It works with the data you have provided in the question.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.