0

After doing some operation, I got a rdd (like following one) of array(any) where all the value are of type Int expect 3,8 and 13 are of type string.

Array[Array[Any]] = Array(Array(1, 2, 3, 4, 5), Array(6, 7, 8, 9, 10), Array(11, 12, 13, 14, 15))

Use following code for your reference:

var exp = sc.parallelize(Array(Array(1,2,"3",4,5),Array(6,7,"8",9,10),Array(11,12,"13",14,15)))

Now I am trying to create a dataframe from this array using case class where columns name and case class is following:

case class specialchar(alpha:Int,beta:Int,gamma:String,theta:Int,zeta:Int) 

I need help how we can iterate through the rdd of Array[Array[Any]] and store in dataframe. Thanks in Advance.

0

1 Answer 1

0

Udf's to handle Any.

def toInt(x: Any): Option[Int] = x match {
  case i: Int => Some(i)
  case _ => None
}

def toStr(x: Any): Option[String] = x match {
  case i: String => Some(i)
  case _ => None
}

Case classes and converting Array to Df.

var exp = sc.parallelize(Array(Array(1,2,"3",4,5),Array(6,7,"8",9,10),Array(11,12,"13",14,15)))
case class specialchar(alpha:Int,beta:Int,gamma:String,theta:Int,zeta:Int)  

var specialCharDf = Seq.empty[specialchar].toDF

exp.collect().foreach(x => {
    val a:Int = toInt(x(0)).getOrElse(1)
    val b:Int = toInt(x(1)).getOrElse(1)
    val c:String = toStr(x(2)).getOrElse("1")
    val d:Int = toInt(x(3)).getOrElse(1)
    val e:Int = toInt(x(4)).getOrElse(1)

    println(a, b, c, d, e)

    val specialcharTempDf =  Seq(specialchar(a,b,c,d,e)).toDF
    specialCharDf = specialcharTempDf.union(specialCharDf)
})

specialCharDf.printSchema() //follows schema desired.

EDIT EDIT EDIT -- akhil mentioned that at the end, they should all be in integers. The new solution is below:

    var exp = sc.parallelize(Array(Array(1,2,"3",4,5),Array(6,7,"8",9,10),Array(11,12,"13",14,15)))
    case class specialchar(alpha:Int,beta:Int,gamma:Int,theta:Int,zeta:Int)  

    var specialCharDf = Seq.empty[specialchar].toDF

exp.collect().foreach(x => {
    val a:Int = toInt(x(0)).getOrElse(1)
    val b:Int = toInt(x(1)).getOrElse(1)
    val c:String = toStr(x(2)).getOrElse("1")
    val f = c.toInt
    val d:Int = toInt(x(3)).getOrElse(1)
    val e:Int = toInt(x(4)).getOrElse(1)

    println(a, b, f, d, e)

    val specialcharTempDf =  Seq(specialchar(a,b,f,d,e)).toDF
    specialCharDf = specialcharTempDf.union(specialCharDf)
})

specialCharDf.printSchema() //follows schema desired.
Sign up to request clarification or add additional context in comments.

7 Comments

What if Instead of above situation the numbers 3,8 and 13 which are in string format should be converted into integer?
@Akhil Instead of doing val c:String = toStr(x(2)).getOrElse("1") you would just do val c:Int = toInt(x(2)).getOrElse(1). You would need to change the case case to gamma:Int too.
I have tried your suggested change in the original code but get the final output as following. (1,2,1,4,5)(6,7,1,9,10)(11,12,1,14,15) The gamma column is getting the values from .getOrElse(1) from c variable. This is because the string value in RDD array is not supportive to toInt(). I think so..
@Akhil check the new solution. I think I misunderstood you above.
The code var specialCharDf = Seq.empty[specialchar].toDF is running fine in REPL mode but throwing error in the Ecllipse's Scala IDE. The error is showing like value toDF is not a member of Seq[specialchar] Any suggestion to this?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.