numpy 2d array combination

Question

Suppose I have two arrays:

a = np.array(
[[0, 1],
 [2, 3],
 [4, 5],
 [6, 7]])

b = np.array(
[[2, 3],
 [6, 7],
 [0, 1],
 [4, 5]])

As you can see, one array is simply a shuffle of the other. I need to combine these two arrays to form a third array, c, such as:

the first part of array c (until a random index i) consists of elements from the first part of array a (until index i). Therefore, c[:i] == a[:i] must return True.
the rest of the array c is filled by values from array b, that are not already inside array c, in the exact same order they appear in.

Given that index i is set to 2, the desired output for above arrays a and b in the code should be:

> c
[[0, 1],
 [2, 3],
 [6, 7],
 [4, 5]]

Array c must be of the same length as both array b and array a, and there is a possibility that two elements within either array a or array b are the same. Array c must also consist of the same elements that are in a and b, (i.e. it behaves somewhat like a shuffle).

I've tried multiple solutions, but none give the desired result. The closest was this:

a = np.arange(10).reshape(5, 2)
np.random.shuffle(a)

b = np.arange(10).reshape(5, 2)
b_part = b[:4]

temp = []

for part in a:
    if part in b_part:
        continue
    else:
        temp.append(part)

temp = np.array(temp)

c = copy.deepcopy(np.vstack((b_part, temp)))

However, it sometimes results in array c being smaller than arrays a and b, because the elements in either list can sometimes repeat.

If elements are not unique your rules imply that c can be shorter: Example a = [(0,1),(2,3),(2,3),(4,5)] b=[(2,3),(4,5),(2,3),(0,1)] i=2 So you'd pick a[:i] which is [(0,1),(2,3)] and from b what has not occurred yet which is (4,5). This c would be [(0,1),(2,3),(4,5)] which is shorter. — Paul Panzer
– Paul Panzer, Commented Mar 29, 2019 at 21:39
@PaulPanzer I understand that is what's causing the issue, but I don't know how I can address it myself (which is why I'm asking the question) — Sergey Ronin
– Sergey Ronin, Commented Mar 29, 2019 at 21:43
The first thing would be to decide what your desired answer would be in this case. — Paul Panzer
– Paul Panzer, Commented Mar 29, 2019 at 21:46
imho your code is working though (just create a_part instead of b_part and reverse arrays in both loops+ vstack(a_part, temp)) but your issue with size(c) is your problem definition if you have duplicates inside a which is basis for c. Imagine that you split a with index i as resulting c in the way that you take just one duplicate of values inside a so your c cannot have the same size as a or b because you cannot add other duplicate from b — vldbnc
– vldbnc, Commented Mar 29, 2019 at 22:07
@KomronAripov using your pic example, try index [:4] instead of [:2] and give us result. — vldbnc
– vldbnc, Commented Mar 29, 2019 at 22:37

Paul Panzer · Accepted Answer · 2019-03-29 23:02:38Z

2

The following should handle duplicates alright.

def mix(a, b, i):                                             
    sa, sb = map(np.lexsort, (a.T, b.T))                      
    mb = np.empty(len(a), '?')                                
    mb[sb] = np.arange(2, dtype='?').repeat((i, len(a)-i))[sa]
    return np.concatenate([a[:i], b[mb]], 0)

It

indirectly sorts a and b
creates a mask which is True at the positions not taken from a, i.e. has i Falses and then len(a)-i Trues.
uses the sort orders to map that mask to b
filters b with the mask and appends to a[:i]

Example (transposed to save space):

a.T
# array([[2, 2, 0, 2, 3, 0, 2, 0, 0, 1],
#        [0, 1, 2, 0, 1, 0, 3, 0, 0, 0]])
b.T
# array([[0, 0, 2, 1, 0, 0, 2, 2, 2, 3],
#        [0, 0, 0, 0, 2, 0, 1, 3, 0, 1]])
mix(a, b, 6).T
# array([[2, 2, 0, 2, 3, 0, 0, 1, 0, 2],
#        [0, 1, 2, 0, 1, 0, 0, 0, 0, 3]])

answered Mar 29, 2019 at 23:02

Paul Panzer

53.3k3 gold badges60 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

fountainhead Over a year ago

If

a = np.array( [[0, 1],  [2, 3],  [0, 0],  [2, 3],  [4, 5],  [6, 7]]), b = np.array( [[2, 3],  [6, 7],  [0, 1],  [4, 5],  [2, 3],  [0, 0]])

, then for i=3, this solution gives c = [0, 1], [2, 3], [0, 0], [6, 7], [4, 5], [2, 3]]. The last [2,3] probably violates OP's requirement, since it is already picked up from a, and must not be picked up again from b.

Paul Panzer Over a year ago

@fountainhead No, that's the other [2, 3] ;-] -- More seriously, OP's clarifications as to how to handle dupes are spread out across several comments.

fountainhead Over a year ago

Several comments and several external links too. :-)

fountainhead · Accepted Answer · 2019-03-29 23:38:43Z

2

Here's one solution:

full_len = len(a)

b_not_in_a_part = ~np.all(np.isin(b,a[:i+1]),axis=1)         # Get boolean mask, to apply on b
b_part_len = full_len-i-1                                    # Length of b part of c

c = np.concatenate((a[:i+1], b[b_not_in_a_part,:]), axis=0)  # Contruct c, using the mask for the b part.

Testing it out:

import numpy as np
a = np.array(
[[0, 1],
 [2, 3],
 [0, 0],
 [2, 3],
 [4, 5],
 [6, 7]])
b = np.array(
[[2, 3],
 [6, 7],
 [0, 1],
 [4, 5],
 [2, 3],
 [0, 0]])

i = 2

print ("a is:\n", a)
print ("b is:\n", b)

full_len = len(a)

b_not_in_a_part = ~np.all(np.isin(b,a[:i+1]),axis=1)         # Get boolean mask, to apply on b
b_part_len = full_len-i-1                                    # Length of b part of c

c = np.concatenate((a[:i+1], b[b_not_in_a_part,:]), axis=0)  # Contruct c, using the mask for the b part.
print ("c is:\n", c)

Output:

a is:
 [[0 1]
 [2 3]
 [0 0]
 [2 3]
 [4 5]
 [6 7]]
b is:
 [[2 3]
 [6 7]
 [0 1]
 [4 5]
 [2 3]
 [0 0]]
c is:
 [[0 1]
 [2 3]
 [0 0]
 [6 7]
 [4 5]]

Note: For this example, c has a length of only 5, even though a and b have a length of 6. This is because, due to high duplication in b, there aren't enough values left in b, that are eligible to be used for c.

edited Mar 29, 2019 at 23:38

answered Mar 29, 2019 at 22:39

fountainhead

3,7421 gold badge11 silver badges18 bronze badges

3 Comments

fountainhead Over a year ago

@KomronAripov: This error is occurring because, due to high duplication in b, there aren't enough values in b that are not already in used from a. I can fix it if you tell me what is your requirement for this scenario.

fountainhead Over a year ago

In such a scenario, is it ok if c has smaller length than a or b? (Since there are not enough values to match the full length)

fountainhead Over a year ago

@KomronAripov, Fixed the error, making the assumption that it's ok for c to have a smaller length, if high duplication in b leaves us with not enough values eligible to go into c

BigH · Accepted Answer · 2019-03-29 22:26:04Z

0

Just use numpy.concatenate() and ensure that your index is itself plus 1 (as numpy indexing goes up to but not inclusive of said index value, see below): (Edit: seems like you modified your a, b and c arrays, so I 'll change my code below to accomodate)

import numpy as np

a = np.array(
[[0, 1],
 [2, 3],
 [4, 5],
 [6, 7]])

b = np.array(
[[2, 3],
 [6, 7],
 [0, 1],
 [4, 5]])


i = 2
c = a[0:i]
for k in b:
    if k not in c:
        c = np.concatenate((c, [k]))

print(c)

Output:

[[0 1]
 [2 3]
 [6 7]
 [4 5]]

edited Mar 29, 2019 at 22:26

answered Mar 29, 2019 at 21:46

BigH

3704 silver badges7 bronze badges

5 Comments

Sergey Ronin Over a year ago

after a slight modification to array b (now updated in the answer), the output is not as expected.

Sergey Ronin Over a year ago

this results in duplicates (even though there were none to begin with), which is undesired...

BigH Over a year ago

Ok I get ya, how about now?

Sergey Ronin Over a year ago

this still does not address the issue where there is a possibility that either array a or array b can contain duplicates.

BigH Over a year ago

can you provide an example of what c should look like if a and b contains the duplicates as you have mentioned?

panktijk · Accepted Answer · 2019-03-29 22:27:28Z

0

For i=2, get your first part of the result:
```
c = a[i:]
```

Get "uncommon" elements between b and c:

diff = np.array([x for x in b if x not in c])

Select random elements from diff and concatenate to the original array:

s = len(a) - i
np.concatenate([c, diff[np.random.choice(diff.shape[0], size=s, replace=False), :]], axis=0)

OUTPUT:

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])

edited Mar 29, 2019 at 22:27

answered Mar 29, 2019 at 21:57

panktijk

1,61411 silver badges11 bronze badges

6 Comments

panktijk Over a year ago

You can add replace=False when you select random rows from b. Updated the code.

Sergey Ronin Over a year ago

TypeError: randint() got an unexpected keyword argument 'replace'

panktijk Over a year ago

@KomronAripov Sorry, my bad. You are also supposed to change randint to choice. Try now?

Sergey Ronin Over a year ago

ValueError: Cannot take a larger sample than population when 'replace=False'

panktijk Over a year ago

This is probably your data issue where you are trying to select a larger number of elements than your parent set. It works with the data you have provided in the question.

|

Collectives™ on Stack Overflow

numpy 2d array combination

4 Answers 4

3 Comments

3 Comments

5 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

3 Comments

5 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related