0

I have a dataframe called df which like this but it is actually [9147 rows x 3 columns]

indexID  RngUni[m]  PowUni[dB]
157203   1.292283      132
157201   1.271878      132
157016   1.285481      134
157404   1.305886      136
157500   1.353496      136
157524   1.251474      136
157227   1.292283      132
157543   1.339893      136
157903   1.353496      138
156928   1.299084      134
157373   1.299084      136
156937   1.414709      134
157461   1.353496      136
157718   1.360297      138
157815   1.326290      138
157806   1.271878      134
156899   1.360298      134
157486   1.414709      138
157628   1.271878      136
157405   1.299084      134
157244   1.299084      134
157522   1.258275      136
157515   1.367099      138
157086   1.305886      136
157602   1.251474      134
157131   1.265077      132
157170   1.380702      138
156904   1.360297      134
157209   1.401106      138
157018   1.265077      134

What I am trying to do is to pick certain values of the data in the table.

df.plot(x = 'RngUni[m]', y = 'PowUni[dB]', kind = 'scatter') gives:enter image description here

Assuming that the main group is the area where most of the data points cluster, what I need to do is to pick 80% of the points that are in the main group and 20% of points that are outside the main group.

I need the indexID of all the points outputted as a list. How can I do this?

An example of the clustering required. What I would like to do is to pick 80% of the points in the circle and 20% of the points outside the circle. enter image description here

0

1 Answer 1

2

Here is how I will go about this task:

from io import StringIO
import pandas as pd
from sklearn.cluster import KMeans

s = '''indexID  RngUni[m]  PowUni[dB]
157203   1.292283      132
157201   1.271878      132
157016   1.285481      134
157404   1.305886      136
157500   1.353496      136
157524   1.251474      136
157227   1.292283      132
157543   1.339893      136
157903   1.353496      138
156928   1.299084      134
157373   1.299084      136
156937   1.414709      134
157461   1.353496      136
157718   1.360297      138
157815   1.326290      138
157806   1.271878      134
156899   1.360298      134
157486   1.414709      138
157628   1.271878      136
157405   1.299084      134
157244   1.299084      134
157522   1.258275      136
157515   1.367099      138
157086   1.305886      136
157602   1.251474      134
157131   1.265077      132
157170   1.380702      138
156904   1.360297      134
157209   1.401106      138
157018   1.265077      134'''

ss = StringIO(s)
df = pd.read_csv(ss, sep=r"\s+")
kmeans = KMeans(n_clusters=2, random_state=0).fit(df.values[:,[1,2]])
df['labels']=kmeans.labels_
df['labels']=kmeans.labels_
df.labels.apply(lambda x: 'red' if x==1 else 'blue')

plt.scatter(x=df['RngUni[m]'], y=df['PowUni[dB]'], c=df['labels'])

The output: enter image description here

Just change the clustering algorithm and play with the parameters to get the desired clusters and colors.

Hope it helps.

Sign up to request clarification or add additional context in comments.

5 Comments

Hi! Thank you for your comment. I was just wondering what the 2 different colors in your graph represents? And how would I output a list of the indexes of the selected points?
I just read through this link about kmeans(scikit-learn.org/stable/modules/generated/…) and I still don't quite get how I can output a list of points that have been clustered together.
Kmeans seem to have a split that is by some decision boundary line. However, what I need might not be a split by a line but it could be by a shape. I have edited my question to include this.
@RuvenGuna If you post the complete data, I can show how you can change the clustering algorithm and get the desired cluster. I used KMeans just to demonstrate the idea.
unfortunately, I am unable to do so as I am not allowed to upload the entire data. I have managed to solve the problem though. Thanks for your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.