For identical XY coordinates in the Matplotlib scatter function, how can I sum all the related array-like data values (c-parameter) on the plot?

Question

I have three separate 1-Dimensional Numpy arrays of equal length that I am using as x, y and c parameter inputs to the matplotlib scatter function without a problem. Some of the plot coordinates contained within the x and y arrays are duplicated. Where the coordinates are duplicated, I would like to plot the sum of all the related c parameter (data) values.

Is there a built-in matplotlib way of doing this? Alternatively, I think that I need to remove all the duplicated coordinates from the x and y array and the associated values from the data array. But before doing this, the associated data values must be added to the data array related to the remaining coordinates.

A trivial example is shown below where the duplicated coordinates have been removed and data values added to the one remaining coordinate pair.

Before
x =    np.array([3, 7, 12, 3, 56, 4, 2, 3, 65, 87, 12, 3, 9, 7, 87])
y =    np.array([7, 24, 87, 9, 65, 43, 54, 9, 3, 8, 34, 9, 23, 6, 8])
data = np.array([6, 45, 4, 25, 7, 45, 78, 4, 82, 3, 9, 43, 32, 5, 9])

After
x =    np.array([3, 7, 12, 3, 56, 4, 2, 65, 87, 12, 9, 7])
y =    np.array([7, 24, 87, 9, 65, 43, 54, 3, 8, 34, 23, 6])
data = np.array([6, 45, 4, 72, 7, 45, 78, 4, 12, 9, 32, 5])

I have found an algorithm on Stackoverflow that removes the duplicate coordinates from the x and y arrays in seconds using Python zip and a set. However, my attempt to extend this to the data array took an hour to execute and I don't have the experience to improve on this. The arrays are typically 600,000 elements long.

To get unique elements of a np array, you can use np.unique(). I am not sure how fast this is compared to set(). And for generating the data array where you sum all the values of the repeating elements, I can only think of using a loop to get the index of the repeating numbers and the corresponding data value and summing it up. — Keerthan Rao
– Keerthan Rao, Commented Oct 7, 2024 at 20:19
It would be helpful if you could add the code you tried plus a link to the stackoverflow thread you mentioned. — Flow
– Flow, Commented Oct 8, 2024 at 9:13

Flow · Accepted Answer · 2024-10-08 10:28:32Z

0

The following attempt is pretty fast even for much larger datasets than the ones you are dealing with. I tested a size of 6_000_000 for x,y and data and it still was finished within about 10s, not using a particularly powerful machine.

What is time consuming, though, is printing of the arrays if they reach a certain size.

import numpy as np

# generating some test data
x = np.random.randint(0, 100_000, 600_000)
y = np.random.randint(0, 100_000, 600_000)
data = np.random.randint(0, 10_000, 600_000)

#initializing the result dict
#set(zip()) make sure we are dealing only with unique x/y pairs
data_tmp = {key: 0 for key in set(zip(x,y))}

# determine sum for each unique x,y pair
for key, val in zip(zip(x,y),data):
    data_tmp[key] += val

# translating the dict to your cleaned up arrays
x_after = [a for a,_ in data_tmp.keys()]
y_after = [b for _,b in data_tmp.keys()]
data_after = data_tmp.values()

As a sidenote:

Checking the code on your example I realized your data seems to be wrong. The second 4 needs to be 82.

edited Oct 8, 2024 at 10:28

answered Oct 8, 2024 at 9:50

Flow

7311 gold badge5 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Cymylau Over a year ago

Typo is error on my part. I have tried out the suggested code and can report that it is extremely fast. Under a second for 3 x 600,000 element arrays. I have also used it to filter and plot geographic information on a basemap with expected results.

Collectives™ on Stack Overflow

For identical XY coordinates in the Matplotlib scatter function, how can I sum all the related array-like data values (c-parameter) on the plot?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related