I have an RDD which contains the lines of a file. I want for each partition NOT to contain the lines, but the concatenated lines. For example:
Partition 1 Partition 2
line 1 line n/2+1
line 2 line n/2+2
. .
. .
. .
line n/2 line n
Figure1 above shows my RDD, which is produced when we use sc.textFile() method. I want to go from figure 1 above to the one below (figure 2):
Partition 1 Partition 2
concatenatedLinesFrom1toN/2 concatenatedLinesFromN/2+1toN
Is there any way to map the partitions so I can convert the RDD from figure 1 to the one in Figure 2?