I'm trying to open a text file, process each line and store the result in a multidimensional array.
My input file contains:
1 1 3 2
2 2.2 3 1.8
3 3 1.2 2.5
and I want to create a 3x4 array like this:
(1, 1, 3, 2)
(2, 2.2, 3 1.8)
etc
My code is:
for (line <- Source.fromFile(inputFile).getLines) {
var counters = line.split("\\s+")
sc.parallelize(counters).saveAsTextFile(outputFile)
}
I am trying to save the results in a text but firstly I got an exception in the running process which is:
apache.hadoop.mapred.FileAlreadyExistsException:
Output directory file:/home/user/Desktop/output.txt already exists
I guess that is about the parallelize but that was the only way I found to save an array.
Also, what is stored is not what I want. The file has two partition files that contain:
part1:
1
1
part2:
3
2
How can I create a multidimensional array from one dimension arrays and how can I save it in a text file?