I'm working with a new Spark project using Java. I have to read some data from the CSV files and these CSVs have an array of floats and I do not know how I can get this array in my dataset.
I'm reading from this CSV:
[CSV data image][1] https://imgur.com/a/PdrMhev
And I'm trying to get the data in this way:
Dataset<Row> typedTrainingData = sparkSession.sql("SELECT CAST(IDp as String) IDp, CAST(Instt as String) Instt, CAST(dataVector as String) dataVector FROM TRAINING_DATA");
And I get this:
root
|-- IDp: string (nullable = true)
|-- Instt: string (nullable = true)
|-- dataVector: string (nullable = true)
+-------+-------------+-----------------+
| IDp| Instt| dataVector|
+-------+-------------+-----------------+
| p01| V11apps|-0.41,-0.04,0.1..|
| p02| V21apps|-1.50,-1.50,-1...|
+-------+-------------+-----------------+
As you can see in the schema, I read the array as a String but I want to get as array. Recommendations?
I want to use some Machine Learning algorithms of MLlib in this data loaded, for that reason I want to get the data as array.
Thank you guys!!!!!!!!