1

I'm new to Scala and I have difficulties writing a spark-sql application to dynamically load user classes and map rdds to it.

   rdd.map(line => {  
        val cls = Class.forName("UserClass")  
        val constructor = cls.getConstructor(classOf[String], classOf[String])  
        Tuple1(constructor.newInstence(line._1, line._2)).asInstanceOf[cls.type]  
    }).toDF()  

The problem is converting the object to its declared class, as cls.type returns java.lang.class[_], which is not expected. At runtime the following exception would be threw:

java.lang.UnsupportedOperationException: Schema for type java.lang.class[_] is not supported

BTW, I'm using Scala 2.10 and spark 1.6.1.
Any suggestions and comments would be appreciated! Thanks!

4
  • What happens when you compile and/or run this code? And what do you expect to happen? Commented Dec 17, 2016 at 7:59
  • Thanks for your notice, I've added the exception message. I just expect the object to be of its declared class, not Any or Class[T]. Commented Dec 17, 2016 at 8:53
  • 1
    I really wonder what kind of problem you are trying to solve using this approach. Could you explain your requirements? Commented Dec 17, 2016 at 9:57
  • We have logs of different schema in Kafka (maybe more in the future), and they have a common field user_id. Now we want to apply a 'select distinct user_id' function to each log, so the data processing is suitable for each log. That's why I'm trying to dynamically load the schema for RDD. Commented Dec 17, 2016 at 15:32

1 Answer 1

1

I don't really have a solution, but I can tell you some things you're doing wrong.

You wrap an object in a Tuple1 and then you try to cast the tuple to a different type, instead of the object itself.

cls.type is not the type that the Class cls represents. It is the type of the variable cls, which in this case happens to be java.lang.Class[_].

Casting is mainly a compile time thing. So you can only cast to types that are known at compile time. You say that you are dynamically loading classes, so I guess that they are not known to the compiler.

Sign up to request clarification or add additional context in comments.

2 Comments

If runtime casting is not supported, I'm afraid that I was wrong at the first place. Thank you all the same.
The thing is, I think the runtime type doesn't matter here. The exception seems to indicate that Spark uses the compile time type to determine whether an operation is supported or not. If it was the runtime type that matters, you would not need to cast. But I'm no Spark expert.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.