I'm sure this is something very simple but I didn't find anything related to this.
My code is simple:
...
stream = stream.map(mapper)
stream = stream.reduceByKey(reducer)
...
Nothing extraordinary.The output looks like this:
...
key1 value1
key2 [value2, value3]
key3 [[value4, value5], value6]
...
And so on. So, sometimes I got a flat value (if it's single). Sometimes - nested lists that might be really, really deep (on my simple test data it was 3 levels deep).
I tried searching through sources for something like 'flat' - but found only flatMap method which is (as I understand it) not what I need.
I don't know why those lists are nested. My guess is that they were handled by different processes (workers?) and then joined together without flattening.
Of course, I can write a code in Python which will unfold that list and flatten it. But I believe this is not a normal situation - I think almost everybody needs a flat output.
itertools.chain stops unfolding on fist found non-iterable value. In other words, it still needs some coding (previous paragraph).
So - how to flatten the list using PySpark's native methods?
Thanks
reducer)?