I have two arrays - one contains all data points and the other contains some sample of data points. I would like to get boolean arrays (later to be used as indices) that reveal whether each sample point is contained in the original array of all data points. I am trying to use an approach that will work regardless of the array dimensions used. I have successfully done this, but would like to use a simpler (ie, vectorized without for-loop) approach. A brief example is below:
## ALL DATA POINTS: (xi, yi, zi)
p1 = np.array([1, 2, 3])
p2 = np.array([4, 5, 6])
p3 = np.array([2, 3, 4])
p4 = np.array([7, 8, 5])
points = np.array([p1, p2, p3, p4])
## SAMPLE OF DATA (xi, yi, zi)
s1 = np.array([1, 2, 3])
s2 = np.array([4, 6, 5])
s3 = np.array([7, 8, 9])
samples = np.array([s1, s2, s3])
So the data looks like:
print("\nDATA POINTS ({}):\n{}\n".format(points.shape, points))
print("\nSAMPLE POINTS ({}):\n{}\n".format(samples.shape, samples))
DATA POINTS ((4, 3)):
[[1 2 3]
[4 5 6]
[2 3 4]
[7 8 5]]
SAMPLE POINTS ((3, 3)):
[[1 2 3]
[4 6 5]
[7 8 9]]
So, the point (1, 2, 3) is the first data point and first sample point, and so on. The function below uses a for-loop to determine if the sample points are contained in the original dataset.
f2 = lambda points, samples : np.array([sample == points for sample in samples])
ans2 = f2(points, samples)
The resulting boolean array looks like this:
for sample, arr in zip(samples, ans2):
print("\n-- SAMPLE POINT: {}\n".format(sample))
print("\n .. CONTAINMENT ARRAY ({}):\n{}\n".format(arr.shape, arr))
res = np.all(arr, axis=1)
print("\n .. POINTS CONTAINED ({}):\n{}\n".format(res.shape, res))
-- SAMPLE POINT: [1 2 3]
.. CONTAINMENT ARRAY ((4, 3)):
[[ True True True]
[False False False]
[False False False]
[False False False]]
.. POINTS CONTAINED ((4,)):
[ True False False False]
-- SAMPLE POINT: [4 6 5]
.. CONTAINMENT ARRAY ((4, 3)):
[[False False False]
[ True False False]
[False False False]
[False False True]]
.. POINTS CONTAINED ((4,)):
[False False False False]
-- SAMPLE POINT: [7 8 9]
.. CONTAINMENT ARRAY ((4, 3)):
[[False False False]
[False False False]
[False False False]
[ True True False]]
.. POINTS CONTAINED ((4,)):
[False False False False]
This result is correct.
However, I think there must be a simpler method to achieve this result. I have looked at numpy.isin; however, the results are not identical. Below is my attempt:
f1 = lambda points, samples : np.isin(samples, points)
ans1 = f1(points, samples)
This result looks like:
print("\n*- ANS 1 ({}):\n{}\n".format(ans1.shape, ans1))
*- ANS 1 ((3, 3)):
[[ True True True]
[ True True True]
[ True True False]]
From this result, I can see that the array checks for the values of 4, 5, and 6 without regard for their respective placements in the array, which is why True is returned for each element of the second row.
How can I modify this approach or start anew to check if each sub-array of sample points is contained in the array of all data points in a simpler way?
pointsandsamplesin the first example (at the top), I would like the output to be[True, False, False]- True for sample points1, and False for sample pointss2ands3.(samples[:,None]==points).all(2).any(1). For perf - stackoverflow.com/questions/54791950/…