I can reproduce the error when I try to extract the elements that don't exist, i.e. give an index that is larger than the sequence length:
val myDF = Seq(Seq(1.0, 2.0 ,3, 4.0), Seq(4.0,3,2,1)).toDF("vector")
myDF: org.apache.spark.sql.DataFrame = [vector: array<double>]
def extract (index : Integer) = udf((v: Seq[Double]) => v.grouped(2).toSeq(index))
// extract: (index: Integer)org.apache.spark.sql.expressions.UserDefinedFunction
val i = 2
myDF.withColumn("measurement_"+i,extract(i)($"vector")).show
Gives this error:
org.apache.spark.SparkException: Failed to execute user defined function($anonfun$extract$1: (array<double>) => array<double>)
Most likely you have the same problem while doing toSeq(index), try use toSeq.lift(index) which returns None if the index is out of bound:
def extract (index : Integer) = udf((v: Seq[Double]) => v.grouped(2).toSeq.lift(index))
extract: (index: Integer)org.apache.spark.sql.expressions.UserDefinedFunction
Normal index:
val i = 1
myDF.withColumn("measurement_"+i,extract(i)($"vector")).show
+--------------------+-------------+
| vector|measurement_1|
+--------------------+-------------+
|[1.0, 2.0, 3.0, 4.0]| [3.0, 4.0]|
|[4.0, 3.0, 2.0, 1.0]| [2.0, 1.0]|
+--------------------+-------------+
Index out of bound:
val i = 2
myDF.withColumn("measurement_"+i,extract(i)($"vector")).show
+--------------------+-------------+
| vector|measurement_2|
+--------------------+-------------+
|[1.0, 2.0, 3.0, 4.0]| null|
|[4.0, 3.0, 2.0, 1.0]| null|
+--------------------+-------------+