1

I have a data frame doubleSeq whose structure is as below

res274: org.apache.spark.sql.DataFrame = [finalFeatures: vector]

The first record of the column is as follows

res281: org.apache.spark.sql.Row = [[3.0,6.0,-0.7876947819954485,-0.21757635218517163,0.9731844373162398,-0.6641741696340383,-0.6860072219935377,-0.2990737363481845,-0.7075863760365155,0.8188108975549018,-0.8468559840943759,-0.04349947247406488,-0.45236764452589984,1.0333959313820456,0.6097566070878347,-0.7106619551471779,-0.7750330808435969,-0.08097610412658443,-0.45338437108038904,-0.2952869863393396,-0.30959772365257004,0.6988768123463287,0.17049117199049213,3.2674649019757385,-0.8333373234944124,1.8462942520757128,-0.49441222531240125,-0.44187299748074166,-0.300810826687287]]

I want to extract the double array

[3.0,6.0,-0.7876947819954485,-0.21757635218517163,0.9731844373162398,-0.6641741696340383,-0.6860072219935377,-0.2990737363481845,-0.7075863760365155,0.8188108975549018,-0.8468559840943759,-0.04349947247406488,-0.45236764452589984,1.0333959313820456,0.6097566070878347,-0.7106619551471779,-0.7750330808435969,-0.08097610412658443,-0.45338437108038904,-0.2952869863393396,-0.30959772365257004,0.6988768123463287,0.17049117199049213,3.2674649019757385,-0.8333373234944124,1.8462942520757128,-0.49441222531240125,-0.44187299748074166,-0.300810826687287]

from this -

doubleSeq.head(1)(0)(0)

gives

Any = [3.0,6.0,-0.7876947819954485,-0.21757635218517163,0.9731844373162398,-0.6641741696340383,-0.6860072219935377,-0.2990737363481845,-0.7075863760365155,0.8188108975549018,-0.8468559840943759,-0.04349947247406488,-0.45236764452589984,1.0333959313820456,0.6097566070878347,-0.7106619551471779,-0.7750330808435969,-0.08097610412658443,-0.45338437108038904,-0.2952869863393396,-0.30959772365257004,0.6988768123463287,0.17049117199049213,3.2674649019757385,-0.8333373234944124,1.8462942520757128,-0.49441222531240125,-0.44187299748074166,-0.300810826687287]

Which is not solving my problem

Scala Spark - split vector column into separate columns in a Spark DataFrame

Is not solving my issue but its an indicator

1 Answer 1

3

So you want to extract a Vector from a Row, and turn it into an array of doubles.

The problem with your code is that the get method (and the implicit apply method you are using) returns an object of type Any. Indeed, a Row is a generic, unparametrized object and there is no way to now at compile time what types it contains. It's a bit like Lists in java 1.4 and before. To solve it in spark, you can use the getAs method that you can parametrize with a type of your choosing.

In your situation, you seem to have a dataframe containing a vector (org.apache.spark.ml.linalg.Vector).

import org.apache.spark.ml.linalg._
val firstRow = df.head(1)(0) // or simply df.head
val vect : Vector = firstRow.getAs[Vector](0)
// or all in one: df.head.getAs[Vector](0)

// to transform into a regular array
val array : Array[Double] = vect.toArray

Note also that you can access columns by name like this:

val vect : Vector = firstRow.getAs[Vector]("finalFeatures")
Sign up to request clarification or add additional context in comments.

17 Comments

How do i convert from wrapped array to normal double array ?
.toArray, I edited the answer to make it clearer.
value getAs is not a member of org.apache.spark.sql.DataFrame -> which import statement are you using ?
It's a method of Row, not dataframe
I am getting an error : java.lang.ClassCastException: org.apache.spark.ml.linalg.DenseVector cannot be cast to scala.collection.Seq ... 54 elided
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.