2

I have an array (M x N) of air pressure data (gridded model data). There's also two arrays (also M x N) for latitudes and longitudes. To build a GeoJSON of isobars (surfaces of equal pressure) I need to find clusters of pressure values with given step (1 Pa, 0.5 Pa). In general I was thinking to solve it like that:

  1. Build a list of objects: [{ lat, lon, pressure },..] to keep lat and lon data linked to a pressure;
  2. Sort objects by pressure;
  3. For each object in list: compare its pressure value and move to a dedicated list;
  4. Create GeoJSON features.

But step 3 is not yet clear to me: how to find clusters in a smart way? Which algorithm should I look for? Can I do that with scipy.cluster package?

1
  • Is your range of isobar grid fixed? Then it's as easy as isobar_bucket_no = trunc(pressure / 0.5), where 0.5 is your grid step. You don't even need sorting. If you need to calculate the range dynamically, find min and max pressure, then find an appropriate grid step so that the number of isobars is reasonable. Commented Sep 8, 2014 at 16:50

1 Answer 1

1

I don't think you are looking for cluster at all.

Apparently the isobar ranges are given. So split your data set on them; you do not need to sort for this - just find the minimum and maximum to get all buckets, then select data according to each bucket separately. This breaks the problem down nicely into smaller chunks.

I guess your problem is largely a visualization one. You want to display areas of similar pressure instead of points, right?

Instead of looking at statistical methods such as least-squares optimization (k-means), which require you to predefine the parameter k, consider looking at visualization techniques such as Alpha Shapes (closely related to convex hulls, but they also allow non-convex shapes). If you compute alpha shapes for each of your pressure domains, you should get a nice visualization of these regions.

If you insist on using clustering, have a look at DBSCAN. Mostly for the reason that it allows non-convex shaped clusters, and that it can work with latitude+longitude (k-means doesn't). But even HAC may be able to give you good results, since you can define your cut threshold based on your data resolution (e.g. merge any points - in the same pressure bucket - if they are less than 1km apart).

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, I got the idea. You are right: it is more about visualization, that's my fault. But it looks like Concave Hull is something that will solve my problem so I am going to continue with that direction.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.