0

I'm trying to figure out a procedure to perform clustering on a set of data with 52 dimensions. This is purely for my own learning so I have a data set of known fields. The data is from retrosheet.org Gamelogs using the World Series data set. I'm attempting to use only columns 25-77, so only the integers, ignoring the string data.

This is my first attempt at unsupervised learning and while I understand the concepts, I'm struggling to implement a solution in Python. I've been using scipy and numpy. If anyone knows a good place to start or some suggestions on tackling this problem, I'd appreciate it.

1
  • Perhaps add some example code so we can see what you've already tried? Commented Mar 17, 2015 at 7:03

1 Answer 1

1

Scikit learn is the way to go for clustering in Python. See http://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_digits.html#example-cluster-plot-kmeans-digits-py for a demo and code for clustering with 64 features. It would be good to start with the tutorial at http://scikit-learn.org/stable/tutorial/basic/tutorial.html and apply what you learn there to your dataset and then to k-means clustering.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.