0

I am working on a Spark project with scala. I want to train a model which can be k_means, gaussian_mixture, logistic regression, naive_bayes etc. But I cannot define a generic model as a return type. Since these algorithms' types are different like GaussianMixtureModel, KMeansModel etc. I cannot find any logical way to return this trained model.

Here is a peace of code from the project:

model.model_algorithm match {

      case "k_means" =>

        val model_k_means = k_means(data, parameters)

      case "gaussian_mixture" =>

        val model_gaussian_mixture = gaussian_mixture(data, parameters)

      case "logistic_regression" =>

        val model_logistic_regression = logistic_regression(data, parameters)  

}

So is there a way to return this trained model or to define a generic model that accepts all types?

4
  • what is it that you want to do with the trained model? These classes all extend org.apache.spark.mllib.util.Saveable, AntRef and Any, so your method can return any of these types, but that won't necessarily help you. If you want to perform action X on these results later, you might want to create a trait ModelResult with method X, make this pattern-matching return ModelResult, and have three implementations of that trait, each handling a different model. Commented Apr 16, 2016 at 14:30
  • I tried to make them of type Any but predict() method cannot be used in that case. Can you please explain how can I implement pattern-matching in this case. Thank you for your answer. Commented Apr 16, 2016 at 17:34
  • So you have actually initiated three models and pattern matching to know which one runs. If such is the case,it's bad practice. Commented Apr 16, 2016 at 18:49
  • I should return one of machine learning models in a function, and using this model I want to make some prediction on a sample data. I know it is not the true way, defining each model in this way; however I cannot find a solution to this situation since I cannot return a model without knowing its type explicitly in run time. Commented Apr 17, 2016 at 15:30

1 Answer 1

1

You can create a common Interface to wrap all your internal logic of training and predicting and just expose a simple interface to be reused.

trait AlgorithmInterface extends Serializable {
  def train(data: RDD[LabeledPoint])
  def predict(record: Vector)
}

And have Algorithms implemented in classes like

class LogisticRegressionAlgorithm extends AlgorithmInterface {
  var model:LogisticRegressionModel = null
  override def train(data: RDD[LabeledPoint]): Unit = {
    model = new LogisticRegressionWithLBFGS()
      .setNumClasses(10)
      .run(data)
  }
  override def predict(record:Vector): Double = model.predict(record)
}

class GaussianMixtureAlgorithm extends AlgorithmInterface {
  var model: GaussianMixtureModel = null
  override def train(data: RDD[LabeledPoint]): Unit = {
    model = new GaussianMixture().setK(2).run(data.map(_.features))
  }
  override def predict(record: Vector) = model.predict(record)
}

Implementing it

    // Assigning the models to an Array[AlgorithmInterface]
    val models: Array[AlgorithmInterface] = Array(
      new LogisticRegressionAlgorithm(),
      new GaussianMixtureAlgorithm()
    )
    // Training the Models using the Interfaces Train Function
    models.foreach(_.train(data))
    //Predicting the Value
    models.foreach( model=> println(model.predict(vectorData)))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.