Machine Learning clustering with n-dimensional data in Python

Question

I'm trying to figure out a procedure to perform clustering on a set of data with 52 dimensions. This is purely for my own learning so I have a data set of known fields. The data is from retrosheet.org Gamelogs using the World Series data set. I'm attempting to use only columns 25-77, so only the integers, ignoring the string data.

This is my first attempt at unsupervised learning and while I understand the concepts, I'm struggling to implement a solution in Python. I've been using scipy and numpy. If anyone knows a good place to start or some suggestions on tackling this problem, I'd appreciate it.

Perhaps add some example code so we can see what you've already tried? — Drazisil
– Drazisil, Commented Mar 17, 2015 at 7:03

user4322779 · Accepted Answer · 2015-03-17 08:09:53Z

1

Scikit learn is the way to go for clustering in Python. See http://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_digits.html#example-cluster-plot-kmeans-digits-py for a demo and code for clustering with 64 features. It would be good to start with the tutorial at http://scikit-learn.org/stable/tutorial/basic/tutorial.html and apply what you learn there to your dataset and then to k-means clustering.

answered Mar 17, 2015 at 8:09

user4322779

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Machine Learning clustering with n-dimensional data in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related