1

I have the following nested JSON format data stored as syslog which I need to query using HiveQL and convert into a CSV file (which would be used to display a graph) :

"logAggregate": {"name-1":{"time":"74","count":"1"},"name-2":{"time":"2","count":"1"},"name-3 {"time":"2","count":"5"},"name-4":{"time":"22","count":"1"},
 。
 。
 。// and so on..
 。}

The output format I am looking for is something like below:

name-1 time
name-2 time
name-3 time
.
.
. // so on

I am relatively new to Hive so not sure how I should parse this JSON data. I tried fiddling with Lateral View and json_tuple but in vain!

Any help would be much apppreciated!

1
  • You can use a json-serde to define the table and load the data. The details are here. Try this and update the question if you facing any issue. Commented Jun 18, 2014 at 8:29

1 Answer 1

1

Take a look at this blog entry ( http://brickhouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-simple/ ) which describes using the JSON UDF's provided in Brickhouse ( http://github.com/klout/brickhouse ).

For your specific case, you probably want to parse as a map, and then do an explode on the map.

SELECT key,
map_index( value, "time") as time_value
FROM my_table
LATERAL VIEW explode_map( from_json( json, 'map<map<string,string>>') ) kv1 as k, v;
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.