4

I have the following 3 NumPy arrays:

arr1 = np.array(['a', 'b', 'c', 'd', 'e', 'f']).reshape(2, 3)
arr2 = np.array(['g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p']).reshape(2, 5)
arr3 = np.array(['r', 's', 't', 'u']).reshape(2, 2)

I would like to join them column-wise, but have them maintain separation between items coming from each array, like so:

Output:
array([[['a', 'b', 'c'], ['g', 'h', 'i', 'j', 'k'], ['r', 's']],
       [['d', 'e', 'f'], ['l', 'm', 'n', 'o', 'p'], ['t', 'u']]], dtype='<U1')

However, I cannot find a NumPy function, which would achieve that for me. The closest I got was just a plain np.concatenate(), but the output does not retain separation I want:

Input: np.concatenate([arr1, arr2, arr3], axis = 1)
Output:
array([['a', 'b', 'c', 'g', 'h', 'i', 'j', 'k', 'r', 's'],
       ['d', 'e', 'f', 'l', 'm', 'n', 'o', 'p', 't', 'u']], dtype='<U1')

Any suggestions on how I can achieve the desired effect?

UPDATE: Thank you for some great answers. As an added level of difficulty, I would also like the solution to account for a possible variable number of input arrays, which would still share the same number of rows. Therefore, sometimes there would be 3, other times e.g. 6 etc.

3
  • Can you state what your expected resulting shape should be? numpy cannot have jagged arrays i.e. arrays of different length. Or to clarify, you can jut your outer array would be of type object not <U1 Commented Jan 22, 2020 at 5:19
  • In my mind, with the given example arrays, the shape would be (2, 3). However, each of these 3 columns per row would hold an array of its own with number of items equal to the col. count of the original array they came from. Commented Jan 22, 2020 at 5:23
  • You show a (2,3) object dtype array. Why not uses lists? The fast numpy stuff is for numeric values in reectangular layouts. Commented Jan 22, 2020 at 5:54

4 Answers 4

3

You could try:

print(np.array([[x, y, z] for x, y, z in zip(arr1.tolist(), arr2.tolist(), arr3.tolist())]))

Or if you want the inner rows as arrays as well:

print(np.array([np.array([x, y, z]) for x, y, z in zip(arr1.tolist(), arr2.tolist(), arr3.tolist())]))

Output:

[[['a', 'b', 'c'] ['g', 'h', 'i', 'j', 'k'] ['r', 's']]
 [['d', 'e', 'f'] ['l', 'm', 'n', 'o', 'p'] ['t', 'u']]]

And the shape is (2, 3) as expected.

Edit:

As you mentioned in the comment, try:

l = [arr1, arr2, arr3] # list of the arrays:
print(np.array([np.array([x, y, z]) for x, y, z in zip(*[i.tolist() for i in l])]))
Sign up to request clarification or add additional context in comments.

3 Comments

This is a great answer, but I will need to have a slight variation added to it - lets assume that quantity of arr's is not given, but they still do share the same number of rows (e.g. we have sometimes arrs 1-3, sometimes arrs 1-6). How could we modify your script to account for a variable number of elements for zipping?
the inner elements are still python lists. you have to call np.array on each element (np.array(x)), not on the [x,y,z]
It does! Excellent, that was exactly what I was after. Many thanks!
1

This may be a long way to do it, but it works:

arr_all = []
for i in range(arr1.shape[0]):
    row = []
    row.append([arr[i,:] for arr in [arr1, arr2, arr3]])
    arr_all.append(row)
arr_all = np.array(arr_all).reshape(2,3)

2 Comments

Thanks for the answer, but I really want to avoid use of loops if possible - this part of code for me needs to work as fast as possible.
Everyone else is using loops - list comprehension and zip
1

I think this should give you the desired output. It's a modification of the answer given by @U10-Forward-ReinstateMonica where the inner elements were python lists

print(np.array([[np.array(x), np.array(y), np.array(z)] for x, y, z in zip(arr1.tolist(), arr2.tolist(), arr3.tolist())]))

1 Comment

the inner elements are still python lists. you have to call np.array on each element (np.array(x)), not on the [x,y,z]
0
In [13]: arr1 = np.array(['a', 'b', 'c', 'd', 'e', 'f']).reshape(2, 3) 
    ...: arr2 = np.array(['g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p']).reshape(2, 5) 
    ...: arr3 = np.array(['r', 's', 't', 'u']).reshape(2, 2)                                     

If I try to make an object dtype array from these arrays, I get an error:

In [22]: np.array([arr1, arr2, arr3])                                                            
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-155b98609c5b> in <module>
----> 1 np.array([arr1, arr2, arr3])

ValueError: could not broadcast input array from shape (2,3) into shape (2)

If they differed in number of rows, this would work, but with a common row number the result is an error. In such as case, I usually recommend defining an object array of the right size, and filling that:

In [14]: arr = np.empty((2,3), object)                                                           
In [15]: arr                                                                                     
Out[15]: 
array([[None, None, None],
       [None, None, None]], dtype=object)

But if I try to assign the first column, I get the same error:

In [17]: arr[:,0] = arr1                                                                         
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-9894797aa09e> in <module>
----> 1 arr[:,0] = arr1

ValueError: could not broadcast input array from shape (2,3) into shape (2)

I can instead assign row by row:

In [18]: arr[0,0] = arr1[0]                                                                      
In [19]: arr[1,0] = arr1[1]                                                                      
In [20]: arr[0,1] = arr2[0] 
...                                                                     
In [21]: arr                                                                                     
Out[21]: 
array([[array(['a', 'b', 'c'], dtype='<U1'),
        array(['g', 'h', 'i', 'j', 'k'], dtype='<U1'), None],
       [array(['d', 'e', 'f'], dtype='<U1'), None, None]], dtype=object)

Alternatively, we can assign nested lists to the columns, without the broadcast error. This is effectively what the accepted answer is doing:

In [23]: arr[:,0] = arr1.tolist()                                                                
In [24]: arr[:,1] = arr2.tolist()                                                                
In [25]: arr[:,2] = arr3.tolist()                                                                
In [26]: arr                                                                                     
Out[26]: 
array([[list(['a', 'b', 'c']), list(['g', 'h', 'i', 'j', 'k']),
        list(['r', 's'])],
       [list(['d', 'e', 'f']), list(['l', 'm', 'n', 'o', 'p']),
        list(['t', 'u'])]], dtype=object)

These difficulties in creating the desired array are a good indicator that this is not, NOT, a good numpy array structure. If it's hard to make, it probably will also be hard to use, or at least slow. Iteration on an object dtype array is slower than iteration on a list. About its only advantage compared to a list is that it is easy to reshape.

====

np.array does work if the inputs are lists instead of array:

In [33]: np.array([arr1.tolist(), arr2.tolist(), arr3.tolist()])                                 
Out[33]: 
array([[list(['a', 'b', 'c']), list(['d', 'e', 'f'])],
       [list(['g', 'h', 'i', 'j', 'k']), list(['l', 'm', 'n', 'o', 'p'])],
       [list(['r', 's']), list(['t', 'u'])]], dtype=object)

or convert to a list to give a 'cleaner' display:

In [34]: _.tolist()                                                                              
Out[34]: 
[[['a', 'b', 'c'], ['d', 'e', 'f']],
 [['g', 'h', 'i', 'j', 'k'], ['l', 'm', 'n', 'o', 'p']],
 [['r', 's'], ['t', 'u']]]

and a transpose of that array does give the desired (3,2) array:

In [35]: _33.T.tolist()                                                                          
Out[35]: 
[[['a', 'b', 'c'], ['g', 'h', 'i', 'j', 'k'], ['r', 's']],
 [['d', 'e', 'f'], ['l', 'm', 'n', 'o', 'p'], ['t', 'u']]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.