2

I have the following schema:

geometry: struct (nullable = true)
    -- coordinates: array (nullable = true)
        -- element: array (containsNull = true)  
            -- element: array (containsNull = true)
                -- element: double (containsNull = true)

In Java, how can I access the double element with a Spark SQL row?

The furthest I can seem to get is: row.getStruct(0).getList(0).

Thanks!

2 Answers 2

2

In Scala this works, I leave it to you to translate it to java:

import org.apache.spark.sql.{Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}
import scala.collection.mutable.WrappedArray


object Demo {

  case class MyStruct(coordinates:Array[Array[Array[Double]]])
  case class MyRow(struct:MyStruct)

  def main(args: Array[String]): Unit = {

    val sc = new SparkContext(new SparkConf().setAppName("Demo").setMaster("local[*]"))
    val sqlContext = new SQLContext(sc)
    import sqlContext.implicits._

    val data = MyRow(MyStruct(Array(Array(Array(1.0)))))
    val df= sc.parallelize(Seq(data)).toDF()

    // get first entry (row)
    val row = df.collect()(0)

    val arr = row.getAs[Row](0).getAs[WrappedArray[WrappedArray[WrappedArray[Double]]]](0)

    //access an element
    val res = arr(0)(0)(0)

    println(res) // 1.0

  }
}
Sign up to request clarification or add additional context in comments.

4 Comments

Unfortunately, I am not seeing anyway to get this line to work in Java: val res = arr(0)(0)(0)
@MichaelJ.Perry arr(0) means nothing more than getting the element at index 0, maybe this line works like this in Java: arr.get(0).get(0).get(0) ?
Finally got it working, had to do row.apply(0).apply(0).apply(0).doubleValue(). Thanks for the help!!
this gave me a java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer cannot be cast to scala.collection.mutable.WrappedArray but I changed the WrappedArray's to .map(row => { row.getAs[Row](0).getAs[Seq[Seq[Seq[Double]]]](0) }) and it worked.
1

It is best to avoid accessing row directly. You can:

df.selectExpr("geometry[0][0][0]")

or

df.select(col("geometry").getItem(0).getItem(0).getItem(0))

and use the result.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.