3

I have to determine the schema from the values (not the keys) of a Map[String, Object].

Sample map:

val myMap = Map("k1" -> 1, "k2" -> "", "k3"->  new Timestamp(new Date().getTime), "k4" -> 2.0 )

Currently I have created a schema from the keys like below:

// I have created a schema using keys
val schema = StructType(myMap.keys.toSeq.map {
  StructField(_, StringType) // StringType is wrong since Object in the Map can be of any datatype
}

// I have created a RDD like below
val rdd = sc.parallelize(Seq(Row.fromSeq(myMap.values.toSeq)))
val df = sc.createDataFrame(rdd,schema)

But now my problem is that the object can be a double or date or timestamp or anything. But I have created a schema using StringType as described above which is wrong.

Any ideas of creating a schema from Map values that are objects?

2
  • @shaido : any ideas ? Commented Nov 6, 2018 at 3:04
  • @ramesh-maharjan : infact I followed one of the post by you which is related to this question. which was working for normal types. but in this case any suggestions? Commented Nov 6, 2018 at 6:07

1 Answer 1

1

References : It is an idea from dataTypeFor of ScalaReflection from spark code

You can create struct like this

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.{StringType, StructField, StructType}

 /**
    *createStruct based on datatype
    * @param myObject Object
    * @return [[DataType]]
    */
  def createStruct(myObject: Object): DataType = {

    myObject match {
      case t if t.isInstanceOf[String] => StringType
      case t if t.isInstanceOf[Long] => LongType
      case t if t.isInstanceOf[Integer] => IntegerType
      case t if t.isInstanceOf[Float] => FloatType
      case t if t.isInstanceOf[Double] => DoubleType
      case t if t.isInstanceOf[java.sql.Timestamp] => TimestampType
    }
  }

Below is the sample snippet which calls the function above..

val a: Seq[(Object, Object)] = myMap.keys.toList.zip(columnsMap.values.toList)
    logger.info("" + a.toString)

    val list = ListBuffer.empty[StructField]

    a.foreach { x => {
      list += StructField(x._1.toString, createStruct(x._2), false)
      //println(createStruct(x._2) + "--" + x.toString())
    }
      //   )
    }
    println("list is " + list)
    val schema = StructType(list.toList)
    println("-----" + schema.treeString)
    val df = sparkSession.sqlContext.createDataFrame(rdd, schema)
    df.printSchema()
    df.show
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.