I have three separate 1-Dimensional Numpy arrays of equal length that I am using as x, y and c parameter inputs to the matplotlib scatter function without a problem. Some of the plot coordinates contained within the x and y arrays are duplicated. Where the coordinates are duplicated, I would like to plot the sum of all the related c parameter (data) values.
Is there a built-in matplotlib way of doing this? Alternatively, I think that I need to remove all the duplicated coordinates from the x and y array and the associated values from the data array. But before doing this, the associated data values must be added to the data array related to the remaining coordinates.
A trivial example is shown below where the duplicated coordinates have been removed and data values added to the one remaining coordinate pair.
Before
x = np.array([3, 7, 12, 3, 56, 4, 2, 3, 65, 87, 12, 3, 9, 7, 87])
y = np.array([7, 24, 87, 9, 65, 43, 54, 9, 3, 8, 34, 9, 23, 6, 8])
data = np.array([6, 45, 4, 25, 7, 45, 78, 4, 82, 3, 9, 43, 32, 5, 9])
After
x = np.array([3, 7, 12, 3, 56, 4, 2, 65, 87, 12, 9, 7])
y = np.array([7, 24, 87, 9, 65, 43, 54, 3, 8, 34, 23, 6])
data = np.array([6, 45, 4, 72, 7, 45, 78, 4, 12, 9, 32, 5])
I have found an algorithm on Stackoverflow that removes the duplicate coordinates from the x and y arrays in seconds using Python zip and a set. However, my attempt to extend this to the data array took an hour to execute and I don't have the experience to improve on this. The arrays are typically 600,000 elements long.
np.unique(). I am not sure how fast this is compared toset(). And for generating the data array where you sum all the values of the repeating elements, I can only think of using a loop to get the index of the repeating numbers and the corresponding data value and summing it up.