0

I am a bit stuck trying to convert RDD[Array[Byte]] into Array[Byte]. I have the following where I am first extracting the RDD[Array[Byte]] from a previously defined tuple in the form of RDD(String, Array[Byte]):

val extractArrayFromRDD: RDD[Array[Byte]] = rdd.map(t => t._2)

I then can really only get the first element of the array, as follows:

val rddToBytes: Array[Byte] = extractArrayFromRDD.first()

However, I indeed need to be returned with the entire Array, but I do not seem to find a way to do it. Any idea?

Thank you

1 Answer 1

1

Not sure I understood you correctly, but if you want to collect your RDD you can obtain it as an Array using

rdd.collect()

which should return an Array[Array[Byte]]. And if you want the arrays combined into an Array[Byte], you can use flatten() or whatever suits your needs.

Sign up to request clarification or add additional context in comments.

2 Comments

thank you for your comment. Yeah I was not sure how to move from Array[Array[Byte]] to the desiderd data structure Array[Byte]. I am indeed fairly new to RDD manipulation and Scala programming in general.
Glad you found it helpful. Notice however the implications of collecting: You're losing the benefits of distribution of the data, and moreover you have to make sure there is not too much data in the RDD to be collected to an array on the master. Ae a consequence, you should generally perform as much of the work as possible before collecting.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.