Numpy Array: Efficiently find matching indices, but with possible missing corresponding values

Question

I have a question very similar to this post. Essentially I start with 2 2d arrays(of possibly different width), with a bunch of rows where the leftmost column acts as an effective index and I would like to combine the two arrays (unlike in the original post we can assume the leftmost column is already in ascending order)

a = np.array([[1,2], [5,0], [6,4]]) 
b = np.array([[1,10], [5,20], [6,30]])

would be merged into this

[[1  2 10]
[5  0 20]
[6  4 30]]

As in the original port. However, there are two new things I would like to do. First I'd like to match the two arrays by the leftmost value deleting any rows that don't have a matching value on the other array. As an example,

a = np.array([[1,2],[3,2], [5,0], [6,4]])
b = np.array([[1,10],[6,30], [5,20], [7,80]])

would still be

[[1  2 10]
[5  0 20]
[6  4 30]]

As [3,2] from array a and [7,80] would be ignored on array b. Second, as a seperate function I'd like to join these two arrays similarly, but whenever a matching value cannot be found I'd like to create a new row with np.nan (or some other unique non-numerical filler)

[[1  2      10]
 [3  2      np.nan]
 [5  0      20]
 [6  4      30]
 [7  np.nan 80]]

I have two programs that do these things but they are not efficient, as they iterate over each row of the input arrays (of possibly different width), effectively 'zipping' the rows together by case.

Are there good efficient ways to do this with builtin numpy functions?

A data manipulation package, e.g. Pandas, is much better choice for this (merge/join) operation than Numpy. — Quang Hoang
– Quang Hoang, Commented Sep 25, 2023 at 16:33
good to know, lets say I turn it into a panda dataframe, what would be the best way to do this then? — Alosapien
– Alosapien, Commented Sep 25, 2023 at 17:38
Actually, I just looked up the pandas concat functions. I think you are right. I'll give it a try — Alosapien
– Alosapien, Commented Sep 25, 2023 at 17:59
the idea of an index column is not inherent to any builtin numpy function. — hpaulj
– hpaulj, Commented Sep 25, 2023 at 18:26

Andrej Kesely · Accepted Answer · 2023-09-25 19:11:42Z

1

Here is an example how you can do it with pandas:

import pandas as pd

a = np.array([[1, 2], [3, 2], [5, 0], [6, 4]])
b = np.array([[1, 10], [6, 30], [5, 20], [7, 80]])

out = pd.DataFrame(a).merge(pd.DataFrame(b), on=[0], how="inner").to_numpy()
print(out)

Prints:

[[ 1  2 10]
 [ 5  0 20]
 [ 6  4 30]]

For the second example, chose how="outer":

out = pd.DataFrame(a).merge(pd.DataFrame(b), on=[0], how="outer").to_numpy()
print(out)

Prints:

[[ 1.  2. 10.]
 [ 3.  2. nan]
 [ 5.  0. 20.]
 [ 6.  4. 30.]
 [ 7. nan 80.]]

answered Sep 25, 2023 at 19:11

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Alosapien Over a year ago

alright, this looks awesome. Much better than what I started with. I'll give this a shot.

Alosapien Over a year ago

yup, this was it and from it I've been able to do the left and right coponents of the symmetric difference. I'm sure this will be much faster

Andrej Kesely Over a year ago

@Alosapien Here is documentation to pd.merge You can various parameters for how="" etc...

Alosapien Over a year ago

also, as I'm kind of new any improvement on the title of the problem is appreciated.

Collectives™ on Stack Overflow

Numpy Array: Efficiently find matching indices, but with possible missing corresponding values

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related