Multiple type for a variable in spark using scala

Question

I am working on a Spark project with scala. I want to train a model which can be k_means, gaussian_mixture, logistic regression, naive_bayes etc. But I cannot define a generic model as a return type. Since these algorithms' types are different like GaussianMixtureModel, KMeansModel etc. I cannot find any logical way to return this trained model.

Here is a peace of code from the project:

model.model_algorithm match {

      case "k_means" =>

        val model_k_means = k_means(data, parameters)

      case "gaussian_mixture" =>

        val model_gaussian_mixture = gaussian_mixture(data, parameters)

      case "logistic_regression" =>

        val model_logistic_regression = logistic_regression(data, parameters)  

}

So is there a way to return this trained model or to define a generic model that accepts all types?

what is it that you want to do with the trained model? These classes all extend org.apache.spark.mllib.util.Saveable, AntRef and Any, so your method can return any of these types, but that won't necessarily help you. If you want to perform action X on these results later, you might want to create a trait ModelResult with method X, make this pattern-matching return ModelResult, and have three implementations of that trait, each handling a different model. — Tzach Zohar
– Tzach Zohar, Commented Apr 16, 2016 at 14:30
I tried to make them of type Any but predict() method cannot be used in that case. Can you please explain how can I implement pattern-matching in this case. Thank you for your answer. — Merve Bozo
– Merve Bozo, Commented Apr 16, 2016 at 17:34
So you have actually initiated three models and pattern matching to know which one runs. If such is the case,it's bad practice. — eliasah
– eliasah, Commented Apr 16, 2016 at 18:49
I should return one of machine learning models in a function, and using this model I want to make some prediction on a sample data. I know it is not the true way, defining each model in this way; however I cannot find a solution to this situation since I cannot return a model without knowing its type explicitly in run time. — Merve Bozo
– Merve Bozo, Commented Apr 17, 2016 at 15:30

Vishnu667 · Accepted Answer · 2016-04-18 21:28:04Z

You can create a common Interface to wrap all your internal logic of training and predicting and just expose a simple interface to be reused.

trait AlgorithmInterface extends Serializable {
  def train(data: RDD[LabeledPoint])
  def predict(record: Vector)
}

And have Algorithms implemented in classes like

class LogisticRegressionAlgorithm extends AlgorithmInterface {
  var model:LogisticRegressionModel = null
  override def train(data: RDD[LabeledPoint]): Unit = {
    model = new LogisticRegressionWithLBFGS()
      .setNumClasses(10)
      .run(data)
  }
  override def predict(record:Vector): Double = model.predict(record)
}

class GaussianMixtureAlgorithm extends AlgorithmInterface {
  var model: GaussianMixtureModel = null
  override def train(data: RDD[LabeledPoint]): Unit = {
    model = new GaussianMixture().setK(2).run(data.map(_.features))
  }
  override def predict(record: Vector) = model.predict(record)
}

Implementing it

    // Assigning the models to an Array[AlgorithmInterface]
    val models: Array[AlgorithmInterface] = Array(
      new LogisticRegressionAlgorithm(),
      new GaussianMixtureAlgorithm()
    )
    // Training the Models using the Interfaces Train Function
    models.foreach(_.train(data))
    //Predicting the Value
    models.foreach( model=> println(model.predict(vectorData)))

Collectives™ on Stack Overflow

Multiple type for a variable in spark using scala

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related