(Hive, SQL) - How to sort a list of string inside a column?

Question

I have a big data problem in Hive (SQL).

SELECT genre, COUNT(*) AS unique_count
FROM table_name
GROUP BY genre

which gives result like:

genre           |   unique_count
----------------------------------
Romance,Crime,Drama,Law | 1560
Crime,Drama,Law,Romance | 895
Law,Romance,Crime,Drama | 942
Adventure,Action        | 3250
Action,Adventure        | 910

What I want is to sort the elements in genre ASC|DESC and get results like

genre           |   unique_count
----------------------------------
Crime,Drama,Law,Romance | 3397
Action,Adventure        | 4160

I could do this in Python but I have over 200 Million rows of data. I'm not aware of any reasonable way I can move that data. So how can I achieve this?

Fix your data structure. Storing lists of things in comma-delimited strings just causes problems. If you didn't know this already, you have now learned why this is a bad idea. — Gordon Linoff
– Gordon Linoff, Commented Feb 10, 2017 at 20:17
Gordon, I actually do know that but when join a company with existing data, there's little you can do than to massage the messy data. right? — Afloz
– Afloz, Commented Feb 10, 2017 at 20:24

David דודו Markovitz · Accepted Answer · 2017-02-10 21:06:11Z

11

select      concat_ws(',',sort_array(split(genre,','))) as genre
           ,count(*)                                    as unique_count

from        table_name

group by    concat_ws(',',sort_array(split(genre,',')))

edited Feb 10, 2017 at 21:06

answered Feb 10, 2017 at 20:58

David דודו Markovitz

45.2k7 gold badges75 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Afloz Over a year ago

Prefect. Thank you!

Collectives™ on Stack Overflow

(Hive, SQL) - How to sort a list of string inside a column?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related