2

I'm writing a Scala script for Spark and I have a "specialArray" as following :

 specialArray = ...
 specialArray.show(6)
 __________________________ console __________________________________

 specialArray: org.apache.spark.sql.DataFrame = [_VALUE: array<string>]
 +--------------+
 |        _VALUE|
 +--------------+
 |    [fullForm]|
 |    [fullForm]|
 |    [fullForm]|
 |    [fullForm]|
 |    [fullForm]|
 |    [fullForm]|
 |    [fullForm]|
 +--------------+
 only showing top 6 rows

But would like to see the content of those "fullForm" sub-arrays, how would you do this, please ?

Thank you very much in advance!

I have already tried to get the first value in this way :

val resultTest = specialArray.map(s => s.toString).toDF().collect()(0)
__________________________ console __________________________________
resultTest: org.apache.spark.sql.Row = [[WrappedArray(fullForm)]]

So I don't know how to deal with that and I haven't found anything "effective" in thdoc: : https://www.scala-lang.org/api/current/scala/collection/mutable/WrappedArray.html.

If you have any ideas or you have some questions to ask me, feel free to leave a message, thanks:).

3
  • You just want to see the content inside _VALUE or perform some other operation? Commented Apr 17, 2018 at 12:48
  • First of all see what there is inside because I actually don't even know the values I have to expect. And then, maybe perform some other operations. Commented Apr 17, 2018 at 14:24
  • @Shankar Koirala : I thank you very much for all your answers but it didn't help me since I couldn't have time to work on it again in the meantime, but be sure I will come back to it soon and then I will accept it if necesary ;). Have a nice day ! Commented Apr 26, 2018 at 7:59

1 Answer 1

1

Here specialArray is a dataframe, So to see the schema of dataframe you use specialArray.printSchema, Which shows the datatypes inside the dataframe.

If you just want to see the data inside the dataframe you can use

specialArray.show(6, false) the parameter false is not to truncate while displaying long values.

Next thing you can do is use select or withColumn to change the WrappedArray to the comma-separated (or any separator) String

import org.apache.spark.sql.functions._
df.select(concat_ws(",", $"_VALUE")).show(false)
df.withColumn("_VALUE", concat_ws(",", $"_VALUE")).show(false)

Hope this help!

Sign up to request clarification or add additional context in comments.

9 Comments

Thanks for answering first of all ! Finally I tried your solution but it doen't seem to display what I would like to see. Or maybe the content of the array is lot of Strings where it is just written "fullForm". Indeed, I tried :
import org.apache.spark.sql.functions._ specialArray.printSchema specialArray.select(concat_ws(",", $"_VALUE")).show(false) specialArray.withColumn("_VALUE", concat_ws(",", $"_VALUE")).show(false) specialArray.map(s => s.toString).toDF().collect()(0)(0)
Here a screenshot of what I did through Spark, it's more clear : link to my image (photobox.co.uk). Sorry for the multiple edits. Would you have other ideas about what I could still try ? Or do you think the content of the array is a simple String 'fullForm' or 'abbreviation' ?
can you share the screenshot?
Yes, of course, here you have the image : <kbd><img src="i.imgur.com/IkWl1X0.png" width="100" height="100"></kbd>
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.