1

I have a column in a dataframe that is an array [always of a single item], that looks like this:

root
 |-- emdaNo: array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- _value: string (nullable = true)
 |    |    |-- id: string (nullable = true)

I can't for the life of me work out how to get the _value from it, in to a string...

Assuming x is the dataframe, i've tried:

x.select($"arrayName._value") // Yields ["myStringHere"]

and

x.select($"arrayName[0]._value") // Errors

How do i get a nice string of the value held in _value out please?

1 Answer 1

2
case class Element(_value: String, id: String)
val df = Seq(Array(Element("foo", "bar"))).toDF("emdaNo")
df.select(element_at($"emdaNo._value", 1) as "_value").show()

Output:

+------+
|_value|
+------+
|   foo|
+------+

Alternatively (and before Spark 2.4)

df.select($"emdaNo._value"(0))

or

df.select($"emdaNo._value".getItem(0))
Sign up to request clarification or add additional context in comments.

2 Comments

thanks, interesting that you mention before spark 2.4 ... But that syntax looks much more readable than element_at
Agreed, the latter ones are arguably more readable for programmers with Scala/Java etc.. background. Function element_at was introduced in Spark 2.4 and perhaps it looks more familiar to people with strong SQL background where indexing starts from 1.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.