1

I have a question very similar to this post. Essentially I start with 2 2d arrays(of possibly different width), with a bunch of rows where the leftmost column acts as an effective index and I would like to combine the two arrays (unlike in the original post we can assume the leftmost column is already in ascending order)

a = np.array([[1,2], [5,0], [6,4]]) 
b = np.array([[1,10], [5,20], [6,30]])

would be merged into this

[[1  2 10]
[5  0 20]
[6  4 30]]

As in the original port. However, there are two new things I would like to do. First I'd like to match the two arrays by the leftmost value deleting any rows that don't have a matching value on the other array. As an example,

a = np.array([[1,2],[3,2], [5,0], [6,4]])
b = np.array([[1,10],[6,30], [5,20], [7,80]])

would still be

[[1  2 10]
[5  0 20]
[6  4 30]]

As [3,2] from array a and [7,80] would be ignored on array b. Second, as a seperate function I'd like to join these two arrays similarly, but whenever a matching value cannot be found I'd like to create a new row with np.nan (or some other unique non-numerical filler)

[[1  2      10]
 [3  2      np.nan]
 [5  0      20]
 [6  4      30]
 [7  np.nan 80]]

I have two programs that do these things but they are not efficient, as they iterate over each row of the input arrays (of possibly different width), effectively 'zipping' the rows together by case.

Are there good efficient ways to do this with builtin numpy functions?

4
  • A data manipulation package, e.g. Pandas, is much better choice for this (merge/join) operation than Numpy. Commented Sep 25, 2023 at 16:33
  • good to know, lets say I turn it into a panda dataframe, what would be the best way to do this then? Commented Sep 25, 2023 at 17:38
  • Actually, I just looked up the pandas concat functions. I think you are right. I'll give it a try Commented Sep 25, 2023 at 17:59
  • the idea of an index column is not inherent to any builtin numpy function. Commented Sep 25, 2023 at 18:26

1 Answer 1

1

Here is an example how you can do it with :

import pandas as pd

a = np.array([[1, 2], [3, 2], [5, 0], [6, 4]])
b = np.array([[1, 10], [6, 30], [5, 20], [7, 80]])

out = pd.DataFrame(a).merge(pd.DataFrame(b), on=[0], how="inner").to_numpy()
print(out)

Prints:

[[ 1  2 10]
 [ 5  0 20]
 [ 6  4 30]]

For the second example, chose how="outer":

out = pd.DataFrame(a).merge(pd.DataFrame(b), on=[0], how="outer").to_numpy()
print(out)

Prints:

[[ 1.  2. 10.]
 [ 3.  2. nan]
 [ 5.  0. 20.]
 [ 6.  4. 30.]
 [ 7. nan 80.]]
Sign up to request clarification or add additional context in comments.

4 Comments

alright, this looks awesome. Much better than what I started with. I'll give this a shot.
yup, this was it and from it I've been able to do the left and right coponents of the symmetric difference. I'm sure this will be much faster
@Alosapien Here is documentation to pd.merge You can various parameters for how="" etc...
also, as I'm kind of new any improvement on the title of the problem is appreciated.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.