How to merge rows using SQL only?

Question

I can neither use pyspark or scala. I can only write SQL code. I have a table with 2 columns item id, name.

item_id, name
1        name1
1        name2
1        name3
2        name4
2        name5

I want to generate results with the names of an item_id concatenated.

item_id,    names
1           name1-name2-name3
2           name4-name5

How do I create such a table with Spark sql?

Does this answer your question? pyspark collect_set or collect_list with groupby — yahoo
– yahoo, Commented Oct 15, 2020 at 7:40

Jacek Laskowski · Accepted Answer · 2020-10-15 20:27:43Z

2

The beauty of Spark SQL is that once you have a solution in any of the supported languages (Scala, Java, Python, R or SQL) you can somewhat figure out other variants.

The following SQL statement seems doing what you ask for:

SELECT item_id, array_join(collect_list(name), '-') as names 
FROM tableName
GROUP BY item_id

In spark-shell it gives the following result:

scala> sql("select item_id, array_join(collect_list(name), '-') as names from so group by item_id").show
+-------+-----------------+
|item_id|            names|
+-------+-----------------+
|      1|name1-name2-name3|
|      2|      name4-name5|
+-------+-----------------+

answered Oct 15, 2020 at 20:27

Jacek Laskowski

75k28 gold badges253 silver badges440 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Fahmi · Accepted Answer · 2020-10-15 06:08:39Z

1

You can try the below -

df.orderBy('names', ascending=False)
    .groupBy('item_id')
    .agg(
        array_join(
            collect_list('names'),
            delimiter='-',
        ).alias('names')
    )

edited Oct 15, 2020 at 6:08

answered Oct 15, 2020 at 5:58

Fahmi

37.5k5 gold badges26 silver badges32 bronze badges

1 Comment

raju Over a year ago

I can only use SQL

Adam · Accepted Answer · 2020-10-15 11:43:58Z

0

You can use Spark data frame's groupBy and agg methods and concat_ws function:

df.groupBy($"item_id").agg(concat_ws("-", collect_list($"name")).alias("names")).show()

Group fields by item_id and aggregating each name field by concatenating them together.

edited Oct 15, 2020 at 11:43

Adam

4,2366 gold badges24 silver badges51 bronze badges

answered Oct 15, 2020 at 8:52

david gupta

564 bronze badges

1 Comment

raju Over a year ago

I can only use SQL

Collectives™ on Stack Overflow

How to merge rows using SQL only?

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related