2

I'm trying to convert an RDD[String] to a Dataframe. The string is comma-separated, so I would like to get one column for each value between commas. To do so, I've tried these steps:

val allNewData_split = allNewData.map(e => e.split(",")) //RDD[Array[String]]
val df_newData = allNewData_split.toDF()  //DataFrame

But I'm getting this:

+--------------------+
|               value|
+--------------------+
|[0.0, 0.170716979...|
|[0.0, 0.272535901...|
|[0.0, 0.232002948...|
+--------------------+

It is not a duplicate of this post (How to convert rdd object to dataframe in spark) due to I'm asking for RDD[String] instead of RDD[Row].

And it also isn't a duplicate of Spark - load CSV file as DataFrame? because this question isn't about reading a CSV file as DataFrame.

2
  • Possible duplicate of Spark - load CSV file as DataFrame?](stackoverflow.com/q/29704333/9613318) Commented May 11, 2018 at 14:17
  • I was looking the answer in that link, @Yogesh, but they use RDD[Row]. Commented May 11, 2018 at 14:21

1 Answer 1

2

If all your array have the same size, you can transform the array to columns like this using apply on Column:

val df = Seq(
  Array(1,2,3),
  Array(4,5,6)
).toDF("arr")

df.show()

+---------+
|      arr|
+---------+
|[1, 2, 3]|
|[4, 5, 6]|
+---------+

val ncols = 3

val selectCols = (0 until  ncols).map(i => $"arr"(i).as(s"col_$i"))

df
  .select(selectCols:_*)
  .show()

+-----+-----+-----+
|col_0|col_1|col_2|
+-----+-----+-----+
|    1|    2|    3|
|    4|    5|    6|
+-----+-----+-----+
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.