1

I have a hive column value stored as string

[[1,2],[3,4,8],[5,6,7,9]]

I need to find out the length of each inner array. How should i go about it ?

Basically i need a query that sums up the sizes of each inner array. Had this column been stored as an array of arrays, i would do something like this

select sum(size(innerArray)) from myTab lateral view explode (mycol) arr as innerArray;

but now when i try the above, i get

FAILED: UDFArgumentException explode() takes an array or a map as a parameter

1 Answer 1

1

Because your initial array is not real array, it is string, you need to parse and explode it:

with mytable as(
select '[[1,2],[3,4,8],[5,6,7,9]]' as mycol
)

select mycol as original_string,
       innerArray_str, 
       --split inner array and get size
       size(split(innerArray_str,',')) inner_array_size
from mytable
    --explode upper array
    --replace `],` (allow spaces before comma) with `,,,` and remove all `[` and `]`, split using ,,, as a delimiter 
     lateral view outer explode(split(regexp_replace(regexp_replace(mycol,'\\] *,',',,,'),'\\[|\\]',''),',,,') )e as innerArray_str 

Result:

original_string             innerarray_str  inner_array_size
[[1,2],[3,4,8],[5,6,7,9]]   1,2             2
[[1,2],[3,4,8],[5,6,7,9]]   3,4,8           3
[[1,2],[3,4,8],[5,6,7,9]]   5,6,7,9         4

Now you can add sum() and group by.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.