0

I'm not sure how to remove empty elements within Hive arrays. To give context, I have a table name tbl1 with six columns (key_1, v_1, v_2, v_3, v_4, v_5). When I query below:

SELECT 
        key1,
        array( nvl(v_1,""),
                nvl(v_2,""),
                nvl(v_3,""),
                nvl(v_4,""),
                nvl(v_4,""),
                nvl(v_5,"")) v_array

FROM
    tbl1;

This result produces:

key1, v_array
1, ["a","b","c","d",""]
2, ["a","b","c","",""]
3, ["a","b","","",""]

However I want my result to look like below:

key1, v_array
1, ["a","b","c","d"]
2, ["a","b","c"]
3, ["a","b"]

1 Answer 1

2

You could write a custom UDF, but it might be easier to do a LATERAL VIEW explode, followed by a collect

SELECT key1, 
      collect( val ) as v_array
FROM 
 ( SELECT key1,
          v.val as val 
   LATERAL VIEW EXPLODE( array( v1, v2, v3, v4, v5 ) ) v as val
     FROM tbl1
   WHERE val is not null
 ) lve ;
Sign up to request clarification or add additional context in comments.

2 Comments

thanks, it works but I have to move the FROM clause before the start of LATERAL VIEW EXPLODE. One question is there any way to sort the array elements? I queried it against my "real table" and its not in order as expected, for example v1 values ends up at the last row.
If you add a "DISTRIBUTE BY SORT BY" to the inner SELECT, it should be ordered according to those keys. ( ie DISTRIBUTE BY key1 SORT BY key1, val ). You might need to use a timestamp or other field to order the right way though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.