Converting from vector column to Double[Array] column in scala Spark

Question

I have a data frame doubleSeq whose structure is as below

res274: org.apache.spark.sql.DataFrame = [finalFeatures: vector]

The first record of the column is as follows

res281: org.apache.spark.sql.Row = [[3.0,6.0,-0.7876947819954485,-0.21757635218517163,0.9731844373162398,-0.6641741696340383,-0.6860072219935377,-0.2990737363481845,-0.7075863760365155,0.8188108975549018,-0.8468559840943759,-0.04349947247406488,-0.45236764452589984,1.0333959313820456,0.6097566070878347,-0.7106619551471779,-0.7750330808435969,-0.08097610412658443,-0.45338437108038904,-0.2952869863393396,-0.30959772365257004,0.6988768123463287,0.17049117199049213,3.2674649019757385,-0.8333373234944124,1.8462942520757128,-0.49441222531240125,-0.44187299748074166,-0.300810826687287]]

I want to extract the double array

[3.0,6.0,-0.7876947819954485,-0.21757635218517163,0.9731844373162398,-0.6641741696340383,-0.6860072219935377,-0.2990737363481845,-0.7075863760365155,0.8188108975549018,-0.8468559840943759,-0.04349947247406488,-0.45236764452589984,1.0333959313820456,0.6097566070878347,-0.7106619551471779,-0.7750330808435969,-0.08097610412658443,-0.45338437108038904,-0.2952869863393396,-0.30959772365257004,0.6988768123463287,0.17049117199049213,3.2674649019757385,-0.8333373234944124,1.8462942520757128,-0.49441222531240125,-0.44187299748074166,-0.300810826687287]

from this -

doubleSeq.head(1)(0)(0)

gives

Any = [3.0,6.0,-0.7876947819954485,-0.21757635218517163,0.9731844373162398,-0.6641741696340383,-0.6860072219935377,-0.2990737363481845,-0.7075863760365155,0.8188108975549018,-0.8468559840943759,-0.04349947247406488,-0.45236764452589984,1.0333959313820456,0.6097566070878347,-0.7106619551471779,-0.7750330808435969,-0.08097610412658443,-0.45338437108038904,-0.2952869863393396,-0.30959772365257004,0.6988768123463287,0.17049117199049213,3.2674649019757385,-0.8333373234944124,1.8462942520757128,-0.49441222531240125,-0.44187299748074166,-0.300810826687287]

Which is not solving my problem

Scala Spark - split vector column into separate columns in a Spark DataFrame

Is not solving my issue but its an indicator

Oli · Accepted Answer · 2019-04-08 09:39:50Z

3

So you want to extract a Vector from a Row, and turn it into an array of doubles.

The problem with your code is that the get method (and the implicit apply method you are using) returns an object of type Any. Indeed, a Row is a generic, unparametrized object and there is no way to now at compile time what types it contains. It's a bit like Lists in java 1.4 and before. To solve it in spark, you can use the getAs method that you can parametrize with a type of your choosing.

In your situation, you seem to have a dataframe containing a vector (org.apache.spark.ml.linalg.Vector).

import org.apache.spark.ml.linalg._
val firstRow = df.head(1)(0) // or simply df.head
val vect : Vector = firstRow.getAs[Vector](0)
// or all in one: df.head.getAs[Vector](0)

// to transform into a regular array
val array : Array[Double] = vect.toArray

Note also that you can access columns by name like this:

val vect : Vector = firstRow.getAs[Vector]("finalFeatures")

edited Apr 8, 2019 at 9:39

answered Apr 8, 2019 at 6:25

Oli

10.5k5 gold badges31 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

17 Comments

Leothorn Over a year ago

How do i convert from wrapped array to normal double array ?

Oli Over a year ago

.toArray, I edited the answer to make it clearer.

Leothorn Over a year ago

value getAs is not a member of org.apache.spark.sql.DataFrame -> which import statement are you using ?

Oli Over a year ago

It's a method of Row, not dataframe

Leothorn Over a year ago

I am getting an error : java.lang.ClassCastException: org.apache.spark.ml.linalg.DenseVector cannot be cast to scala.collection.Seq ... 54 elided

|

Collectives™ on Stack Overflow

Converting from vector column to Double[Array] column in scala Spark

1 Answer 1

17 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

17 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related