2

I have 2 datasets:

df1=

    id_first    latitude    longitude
0   403         45.0714     7.6187
1   403         45.0739     7.6195
2   1249        45.0745     7.6152
3   1249        45.1067     7.6451
4   1249        45.1062     7.6482
5   1531        45.1088     7.6528
6   1531        45.1005     7.6155
7   14318       45.1047     7.6056

df2 =

    id_now  cluster_group
0   403     0
1   1249    1
2   1531    3
3   14318   3

I want can not create a loop (or smth else) to:

  • in df2 Value 403 belong to only one cluster_group (0) go to df1 and check all point related to 403 latitude - 2 points and longitude - 2 points. And plot them.

  • repeat to the whole df1 df2 bu plotting in one graph (different colors for every cluster) - I can manage this actually, but if you can offer smth (?)

P.S. in df2 1531 and 14318 belongs to the same cluster. So anyway, I want to plot its points in one color (or one map).

Try:

n_clusters = 46

for k in range(0, n_clusters):
     ....

plot sample

Every color represents cluster_group

3
  • If you already have a solution... what's your current solution and why are you looking for an alternative? If you explain these points, you can hope for a better answer. Commented Aug 30, 2019 at 16:50
  • @Valentino I don’t have solution. It’s a random photo from internet Commented Aug 30, 2019 at 17:02
  • Ah, ok. Because you said "i can manage this" so I thought you had a solution. Commented Aug 30, 2019 at 17:04

1 Answer 1

2

Here is how you can do it using pandas and matplotlib.pyplot.

import pandas as pd
import matplotlib.pyplot as plt

#here I read the dataframe from a file, you read it in the way you prefer
df1 = pd.read_csv('data.txt', sep='\s+')
df2 = pd.read_csv('data2.txt', sep='\s+')

#the important piece of code is here:
for g, gdf in df2.groupby('cluster_group'):
    df1_to_plot = df1.loc[df1['id_first'].isin(gdf['id_now'])]
    plt.plot(df1_to_plot['latitude'], df1_to_plot['longitude'], label='Cluster {:d}'.format(g))

plt.legend()
plt.show()

Some explanation if you are not familiar with groupby and isin:

  1. df2.groupby('cluster_group') return an iterator over subsets of df2, each subset is buildt grouping all the rows with the same value in 'cluster_group' column.
  2. Using each of these subsets gdf I select the rows of df1 where the value in column 'id_first' is contained in gdf. This is done by isin method. This selection is stored in the dataframe df1_to_plot, which contains the data to be plotted.
  3. Now I can use plt.plot to actually plot the data. Matplotlib will take care of the color by itself. The label parameter is used by the legend method when creating the legend.

Using the simple data you provided, this code would produce the following image (x axis is latitude, y axis is longitude:

sample plot

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.