3

I have a hive table emp_test as below:

'name' as string <br>
'testing' as array< struct < code:string,tests:array < struct < testtype:string,errorline:string>>>>

and have column values :"name" as "JOHN" and "testing" as

[{"code":"cod1234","tests":[{"testtype":"java","errorline":"100"},{"testtype":"C++","errorline":"10000"}]},<br>
 {"code":"cod6790","tests":[{"testtype":"hive","errorline":"10"},{"testtype":"pig","errorline":"978"},{"testtype":"spark","errorline":"35"}]}
]

How to select these values and store in another table

emp_test_detail(name,code,testtype,errorline) as

JOHN cod1234 java       100 <br>
JOHN cod1234 C++        10000<br>
JOHN cod6790 hive       10<br>
JOHN cod6790 pig        978<br>
JOHN cod6790 spark      35<br>

i have tried below query but got error :

*insert into emp_test_detail select <br>
        emp_tasting.code, <br>
        emp_tasting.emp_tests.testtype, <br>
        emp_tasting.emp_tests.errorline from emp_test <br> 
lateral view explode(testing) mytest as emp_tasting <br>
lateral view explode(testing[0].tests) mytest as emp_tasting;* <br>

and here I don't know the exact length of testing array.so how to reference array fields?

Please help me on this ?

1 Answer 1

1

In your example query the error is likely related to using emp_tasting, the same column alias for both lateral view explode lines. They need to have different aliases.

To un-nest an array two levels deep, you need to explode the first array, then refer to the alias of that exploded array when exploding the nested array.

For example, you wanted name, code, testtype, errorline

name is available directly in the table
code is available from the first explode
testtype and errorline are available from the nested explode.

Note I am looking at your schema, not the data you've listed, it's easier for me to reason about

This query should do what you want

SELECT
  name,
  testingelement.code,
  test.testtype, 
  test.errorline 
FROM emp_test 
LATERAL VIEW explode(testing) testingarray as testingelement
LATERAL VIEW explode(testingelement.tests) testsarray as test;

Table and column aliases

Note that explode has two aliases added after it, the first is for the table expression it generates, the second is for the column(s).

So in this example

LATERAL VIEW explode(testing) testingarray as testingelement

testingarray is the table alias
testingelement is the array column alias you need to reference to extract the fields within the struct.

Skipping the first explode

If you only wanted fields directly from the table and from the nested array then you can shortcut that query by doing a single LATERAL VIEW explode to

LATERAL VIEW explode(testing.tests) testsarray as test

The problem with that is it will also explode empty arrays, and you can't use * star expansion, you have to refer to field names explicitly. That's not a bad thing.

What is a bad thing is having to use array indexes in a query. As soon as you start writing field[0] then something smells funky. That would only ever get the first element of the array, and as you've said it relies on knowing the size of the array beforehand which would have very limited use cases.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.