0

I'm trying to convert my struct array by columns of my array, where I would like to separate the fields by ;

My data is this way

root
 |-- id_payment: string (nullable = true)
 |-- contents_json: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- id_seller: string (nullable = true)
 |    |    |-- tot_product: string (nullable = true)
 |    |    |-- tot_unit: long (nullable = true)
 |    |    |-- prizes: string (nullable = true)

+--------------------------------------------+
|contents_json                               |
+--------------------------------------------+
|[{7, 1, 0, 0}, {5, 1, 0, 0}, {5, 1, 0, 0}]  |
|[{7, 1, 0, 0}]                              |
+--------------------------------------------+

I would like to leave it like this

+-----------------------------------------------+
| id_seller| tot_product | tot_unit  | prizes   |
+-----------------------------------------------+
| 7;5;7    | 1;12;5      | 0;1;0     | 0;0;1    |
| 10       | 1           | 0         | 0        |
+-----------------------------------------------+

But I couldn't do this with ws_concat

I tried with this code, but it didn't work. What am I doing wrong

I tried this code, but instead of concatenating it broke into two lines. What I'm trying to do is to concatenate in the same cell these lines val1;val2;val3 and so on. How can I fix this?

sparkdf2.selectExpr("explode(contents_json) AS structCol").select(F.expr("concat_ws(',', structCol.id_campaign)").alias("id_campaign")).show()

enter image description here

1
  • where are all these numbers coming from ? contents_json and expected ouput do not seem to contain the same data. how do you go from one to another ? please explain mapping logic Commented Oct 20, 2021 at 13:09

1 Answer 1

2

well, if you just select the columns, isn't it enough ?

df.show()
+----------+------------------------------------------+
|id_payment|contents_json                             |
+----------+------------------------------------------+
|1         |[[7, 1, 0, 0], [5, 1, 0, 0], [5, 1, 0, 0]]|
|2         |[[7, 1, 0, 0]]                            |
+----------+------------------------------------------+

df.select(
    "id_payment",
    "contents_json.id_seller",
    "contents_json.tot_product",
    "contents_json.tot_unit",
    "contents_json.prizes",
).show()
+----------+---------+-----------+---------+---------+
|id_payment|id_seller|tot_product| tot_unit|   prizes|
+----------+---------+-----------+---------+---------+
|         1|[7, 5, 5]|  [1, 1, 1]|[0, 0, 0]|[0, 0, 0]|
|         2|      [7]|        [1]|      [0]|      [0]|
+----------+---------+-----------+---------+---------+

You can use concat_ws if needed :

df.select(
    "id_payment",
    F.concat_ws(";","contents_json.id_seller").alias("id_seller"),
    "contents_json.tot_product",
    "contents_json.tot_unit",
    "contents_json.prizes",
).show()
+----------+---------+-----------+---------+---------+
|id_payment|id_seller|tot_product| tot_unit|   prizes|
+----------+---------+-----------+---------+---------+
|         1|    7;5;5|  [1, 1, 1]|[0, 0, 0]|[0, 0, 0]|
|         2|        7|        [1]|      [0]|      [0]|
+----------+---------+-----------+---------+---------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.