2

Spark Version : 1.3

I have a requirement where in I'm dealing with BigInteger type data. The Bean Class (Pojo) uses few BigInteger data-types. Parsing of the data and creating JavaRDD works fine, but when creating a dataframe which takes the JavaRDD and a BeanClass as a parameter,Spark throws below exception.

scala.MatchError: class java.math.BigInteger (of class java.lang.Class)
        at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1182)
        at org.apache.spark.sql.SQLContext$$anonfun$getSchema$1.apply(SQLContext.scala:1181)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
        at org.apache.spark.sql.SQLContext.getSchema(SQLContext.scala:1181)
        at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:419)
        at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:447)

Using Spark-Shell I see that Scala is able to handle BigInt

scala> val x = BigInt("190753000000000000000");

x: scala.math.BigInt = 190753000000000000000

I'm not sure what is the reason for the Exception. Any help would be appreciated.

1

3 Answers 3

1

Following up to @zero232's answer, your exception may be resolved by using BigDecimal instead of BigInt in your variable declaration:

scala> val x = BigDecimal("190753000000000000000");
Sign up to request clarification or add additional context in comments.

Comments

0

You get an exception because BigInt is not a supported data type for Spark DataFrames. The only Big* type supported is a BigDecimal.

You'll find a full list of supported types and mappings in a Data Types section of the Spark SQL and DataFrame Guide

Comments

0

You can refer to memSQL's BigInt UserDefinedType :

    package com.memsql.spark.connector.dataframe

    import org.apache.spark.sql.types._


    @SQLUserDefinedType(udt = classOf[BigIntUnsignedType])
    class BigIntUnsignedValue(val value: Long) extends Serializable {
      override def toString: String = value.toString
    }

    /**
     * Spark SQL [[org.apache.spark.sql.types.UserDefinedType]] for MemSQL's `BIGINT UNSIGNED` column type.
     */
    class BigIntUnsignedType private() extends UserDefinedType[BigIntUnsignedValue] {
      override def sqlType: DataType = LongType

      override def serialize(obj: Any): Long = {
        obj match {
          case x: BigIntUnsignedValue => x.value
          case x: String       => x.toLong
          case x: Long         => x
        }
      }

      override def deserialize(datum: Any): BigIntUnsignedValue = {
        datum match {
          case x: String => new BigIntUnsignedValue(x.toLong)
          case x: Long => new BigIntUnsignedValue(x)
        }
      }

      override def userClass: Class[BigIntUnsignedValue] = classOf[BigIntUnsignedValue]

      override def asNullable: BigIntUnsignedType = this

      override def typeName: String = "bigint unsigned"
    }

    case object BigIntUnsignedType extends BigIntUnsignedType

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.