1

I am reading this question: Parse JSON Array and load into hive table.

The nested json comprises multiple } and {, but the regex pattern (?<=\\}),(?=\\{) can recognize json elements. Could anyone please explain how this split function works?

select 
split(substr('[{"a":{"c":"sss"},"w":123},{"b":2},{"r":{"c":"sss"},"w":555}]',2),'(?<=\\}),(?=\\{)')[0],
split(substr('[{"a":{"c":"sss"},"w":123},{"b":2},{"r":{"c":"sss"},"w":555}]',2),'(?<=\\}),(?=\\{)')[1],
split(substr('[{"a":{"c":"sss"},"w":123},{"b":2},{"r":{"c":"sss"},"w":555}]',2),'(?<=\\}),(?=\\{)')[2]

and the result is:

{"a":{"c":"sss"},"w":123}   {"b":2}    {"r":{"c":"sss"},"w":555}]

Btw, an array without [ is sent to json_tuple, like {"a":1},{"b":2}]. This is not a json array at all and why json_tuple can work with it?

0

1 Answer 1

1

REGEXP '(?<=\\}),(?=\\{)' matching comma only between } and {, not including curly brackets

(?<=\\}) is a zero-width lookbehind, asserts that what immediately precedes the current position in the string is }

(?=\\{) is a zero-width positive lookahead assertion, means it should be { after current position

So, split function splits string to array using comma between }{, not including brackets. This results in array of these elements:

element 0 is {"a":{"c":"sss"},"w":123}
element 1 is {"b":2}
element 2 is {"r":{"c":"sss"},"w":555}]

in the mentioned answer, explode is applied to array, it produces rows with array elements.

json_tuple receives element (array already exploded), not array. Yes, it is extra ] in last element, better remove it also, json_tuple recognizes element as struct, not as array, because there is no [.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.