2

I have a table that has some rows with normal JSON and some with escaped values in the JSON field (backslashes)

id obj
1 {"is_from_shopping_bag":true,"products":[{"price":{"amount":"18.00","currency":"USD","offset":100,"amount_with_offset":"1800"},"product_id":"1234","quantity":1}],"source":"cart"}
2 {"is_from_shopping_bag":"","products":"[{\ "product_id\ ":\ "2345\ ",\ "price\ ":{\ "currency\ ":\ "USD\ ",\ "amount\ ":\ "140.00\ ",\ "offset\ ":100},\ "quantity\ ":1}]"}

(Note: I needed to include a space after the backslashes in the above table so that they would show up in the github generated markdown table -- my actual table does not include those spaces between the backslash and the quote character)

I am doing a sql query in Hive to get the 'currency' field.

Currently I can run

SELECT
    id,
    JSON_EXTRACT(obj, '$.products[0].price.currency')
FROM my_table

Which will give me the correct output for the first row, but gives me a NULL in the second row

id obj
1 "USD"
2 NULL

What is the best way to get currency field from the second row? Is there a way to clean up the field and remove the backslashes before trying to JSON_EXTRACT the relevant data? I could use REPLACE to swap the '\ ' for '', but is that the most efficient method?

1 Answer 1

1

Replace \" with " using regexp_replace like this:

regexp_replace(obj,'\\\\"','"') 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.