I used rdd.collect() to create an Array and now I want to use this Array[Strings] to create a DataFrame. My test file is in the following format(separated by a pipe |).
TimeStamp
IdC
Name
FileName
Start-0f-fields
column01
column02
column03
column04
column05
column06
column07
column08
column010
column11
End-of-fields
Start-of-data
G0002B|0|13|IS|LS|Xys|Xyz|12|23|48|
G0002A|0|13|IS|LS|Xys|Xyz|12|23|45|
G0002x|0|13|IS|LS|Xys|Xyz|12|23|48|
G0002C|0|13|IS|LS|Xys|Xyz|12|23|48|
End-of-data
document
the column name are in between Start-of-field and End-of-Field. I want to store "| " pipe separated in different columns of Dataframe.
like below example:
column01 column02 column03 column04 column05 column06 column07 column08 column010 column11
G0002C 0 13 IS LS Xys Xyz 12 23 48
G0002x 0 13 LS MS Xys Xyz 14 300 400
my code :
val rdd = sc.textFile("the above text file")
val columns = rdd.collect.slice(5,16).mkString(",") // it will hold columnnames
val data = rdd.collect.slice(5,16)
val rdd1 = sc.parallelize(rdd.collect())
val df = rdd1.toDf(columns)
but this is not giving me the above desired dataframe
