In PySpark, how to split strings in all columns to a list of string?
a = [('a|q|e','d|r|y'),('j|l|f','m|g|j')]
df = sc.createDataFrame(a,['col1','col2'])
+-----+-----+
| col1| col2|
+-----+-----+
|a|q|e|d|r|y|
|j|l|f|m|g|j|
+-----+-----+
Output expected:
+---------+---------+
| col1| col2|
+---------+---------+
|[a, q, e]|[d, r, y]|
|[j, l, f]|[m, g, j]|
+---------+---------+
I can do single column at a time by using withColumn but not an appealing solution with dynamic number of columns.
from pyspark.sql.functions import col, split
outDF = df.withColumn("col1", split(col("col1"), "\\|").alias("col1"))