Split data using timeframe into groups in python pandas

Question

I have a dataframe called df which like this but it is actually [9147 rows x 3 columns]

indexID  RngUni[m]  PowUni[dB]
157203   1.292283      132
157201   1.271878      132
157016   1.285481      134
157404   1.305886      136
157500   1.353496      136
157524   1.251474      136
157227   1.292283      132
157543   1.339893      136
157903   1.353496      138
156928   1.299084      134
157373   1.299084      136
156937   1.414709      134
157461   1.353496      136
157718   1.360297      138
157815   1.326290      138
157806   1.271878      134
156899   1.360298      134
157486   1.414709      138
157628   1.271878      136
157405   1.299084      134
157244   1.299084      134
157522   1.258275      136
157515   1.367099      138
157086   1.305886      136
157602   1.251474      134
157131   1.265077      132
157170   1.380702      138
156904   1.360297      134
157209   1.401106      138
157018   1.265077      134

What I am trying to do is to pick certain values of the data in the table.

df.plot(x = 'RngUni[m]', y = 'PowUni[dB]', kind = 'scatter') gives:

Assuming that the main group is the area where most of the data points cluster, what I need to do is to pick 80% of the points that are in the main group and 20% of points that are outside the main group.

I need the indexID of all the points outputted as a list. How can I do this?

An example of the clustering required. What I would like to do is to pick 80% of the points in the circle and 20% of the points outside the circle.

Community · Accepted Answer · 2020-06-20 09:12:55Z

2

Here is how I will go about this task:

from io import StringIO
import pandas as pd
from sklearn.cluster import KMeans

s = '''indexID  RngUni[m]  PowUni[dB]
157203   1.292283      132
157201   1.271878      132
157016   1.285481      134
157404   1.305886      136
157500   1.353496      136
157524   1.251474      136
157227   1.292283      132
157543   1.339893      136
157903   1.353496      138
156928   1.299084      134
157373   1.299084      136
156937   1.414709      134
157461   1.353496      136
157718   1.360297      138
157815   1.326290      138
157806   1.271878      134
156899   1.360298      134
157486   1.414709      138
157628   1.271878      136
157405   1.299084      134
157244   1.299084      134
157522   1.258275      136
157515   1.367099      138
157086   1.305886      136
157602   1.251474      134
157131   1.265077      132
157170   1.380702      138
156904   1.360297      134
157209   1.401106      138
157018   1.265077      134'''

ss = StringIO(s)
df = pd.read_csv(ss, sep=r"\s+")
kmeans = KMeans(n_clusters=2, random_state=0).fit(df.values[:,[1,2]])
df['labels']=kmeans.labels_
df['labels']=kmeans.labels_
df.labels.apply(lambda x: 'red' if x==1 else 'blue')

plt.scatter(x=df['RngUni[m]'], y=df['PowUni[dB]'], c=df['labels'])

The output:

Just change the clustering algorithm and play with the parameters to get the desired clusters and colors.

Hope it helps.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Oct 4, 2018 at 8:59

quest

3,9662 gold badges18 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Ruven Guna Over a year ago

Hi! Thank you for your comment. I was just wondering what the 2 different colors in your graph represents? And how would I output a list of the indexes of the selected points?

Ruven Guna Over a year ago

I just read through this link about kmeans(scikit-learn.org/stable/modules/generated/…) and I still don't quite get how I can output a list of points that have been clustered together.

Ruven Guna Over a year ago

Kmeans seem to have a split that is by some decision boundary line. However, what I need might not be a split by a line but it could be by a shape. I have edited my question to include this.

quest Over a year ago

@RuvenGuna If you post the complete data, I can show how you can change the clustering algorithm and get the desired cluster. I used KMeans just to demonstrate the idea.

Ruven Guna Over a year ago

unfortunately, I am unable to do so as I am not allowed to upload the entire data. I have managed to solve the problem though. Thanks for your help!

Collectives™ on Stack Overflow

Split data using timeframe into groups in python pandas

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related