I have a the following dataframe:
I would like to concatenate the lat and lon into a list. Where mmsi is similar to an ID (This is unique)
+---------+--------------------+--------------------+
| mmsi| lat| lon|
+---------+--------------------+--------------------+
|255801480|[47.1018366666666...|[-5.3017783333333...|
|304182000|[44.6343033333333...|[-63.564803333333...|
|304682000|[41.1936, 41.1715...|[-8.7716, -8.7514...|
|305930000|[49.5221333333333...|[-3.6310166666666...|
|306216000|[42.8185133333333...|[-29.853155, -29....|
|477514400|[47.17205, 47.165...|[-58.6317, -58.60...|
Therefore, I would like to concatenate the lat and lon array but on axis = 1, that is, I would like to have at the end a list of lists, in a separate column, like:
[[47.1018366666666, -5.3017783333333], ... ]
How is that could be possible in pyspark dataframe? I have tried concat, but that will return:
[47.1018366666666, 44.6343033333333, ..., -5.3017783333333, -63.564803333333, ...]
Any help is much appreciated!