Here's an approach in pyspark.
# input data
data_sdf = spark.sparkContext.parallelize(data_ls).toDF(['arr_col'])
# +-------------------------------------------------------+
# |arr_col |
# +-------------------------------------------------------+
# |[/level1/level2, /level2/level3/level4, /level1/level2]|
# |[/level1/level2, /level2/level3] |
# |[/level1/level2, /level1/level2/level3, /level1/level2]|
# |[/level1/level2, /level1/level2/level3, /level1/level2]|
# +-------------------------------------------------------+
data_sdf. \
withColumn('rn', func.row_number().over(wd.orderBy(func.lit(1)))). \
withColumn('arr_col_exploded', func.explode('arr_col')). \
withColumn('arr_col_exp_split', func.split('arr_col_exploded', '\/')). \
withColumn('first_element', func.col('arr_col_exp_split')[1]). \
groupBy('rn', 'arr_col'). \
agg(func.collect_list('first_element').alias('first_element_arr')). \
show(truncate=False)
# +---+-------------------------------------------------------+------------------------+
# |rn |arr_col |first_element_arr |
# +---+-------------------------------------------------------+------------------------+
# |1 |[/level1/level2, /level2/level3/level4, /level1/level2]|[level1, level2, level1]|
# |2 |[/level1/level2, /level2/level3] |[level1, level2] |
# |3 |[/level1/level2, /level1/level2/level3, /level1/level2]|[level1, level1, level1]|
# |4 |[/level1/level2, /level1/level2/level3, /level1/level2]|[level1, level1, level1]|
# +---+-------------------------------------------------------+------------------------+
The rn is to help in grouping, if there are duplicate input arrays. The idea is to explode the input array and then split the exploded elements which creates an array of the elements that were delimited by '/'. Once split, we can pull out the second element (which is actually the first element) as the first will be a null (due to the first '/'). Finally, use collect_list to create an array of the first elements.