1

I am having trouble calling json_extract_path_text(my_field, 'some_key') on a field which contains '[]' as data.

It is a valid JSON string but this function simply throw an error.

-----------------------------------------------
ERROR:  JSON parsing error
DETAIL:  
error:  JSON parsing error
code:      8001
context:   invalid json object []
query:     0
location:  funcs_json.h:117
-----------------------------------------------

Are there any good workarounds for this issue? I could probably add an AND my_field != '[]' checking but who knows what else needs to be checked?

3
  • Are you really on Redshift? Is the entire json field nothing but [], i.e. the json is a single empty array? Because if so, an array isn't a json object, empty or otherwise, it's an array. You can't get a field of an array. Commented Jun 6, 2016 at 14:38
  • An empty array [] is a valid JSON string from JSONLint, is it not? Commented Jun 7, 2016 at 7:04
  • Yes, [] is a valid json document but it's not a json object. You can't look up a key in an array, they only have indexes. That said, PostgreSQL (9.5, at least) allows this. So I'm guessing it's a redshift problem. Commented Jun 7, 2016 at 7:52

2 Answers 2

5

what helped me is to set null_if_invalid to be true

Your Try:

json_extract_path_text(my_field, 'some_key')

Try this:

json_extract_path_text(my_field, 'some_key',**TRUE**)
Sign up to request clarification or add additional context in comments.

Comments

1

This appears to be the result of a recent change to the json_extract_path_text function, where it fails on arrays.

As Craig points out, technically the error is correct as this is an array and not a json object.

You might be tempted to use json_extract_array_element_text('json string', pos) as in:

json_extract_path_text(json_extract_array_element_text(my_field, 0), 'some_key')

But if your data is a mix of objects and arrays, this will also fail with the equally-technically-correct-yet-really-just-annoying error of

"context:   invalid json array object {"somekey":"somevalue"}"

Of course, the beauty of these fails is that a single wonk out will also kill your entire query. One workaround might be a UDF, such as the following:

create or replace function f_extract_if_list (j varchar(max))
returns varchar(max)
stable
as $$

    import json

    if not j:
        return None

    try: 
        parsed_j = json.loads(j)
    except ValueError:
        return ''

    if isinstance(parsed_j, dict):
        return j

    if isinstance(parsed_j, list) and len(parsed_j) >= 1:
        return json.dumps(parsed_j[0])

    return ''

$$ language plpythonu;

Which checks to see if the item is an array or not and if so, returns the first element of that array. It might need some tweaking depending on your specific use case.

More info on UDFs can be found here: http://docs.aws.amazon.com/redshift/latest/dg/user-defined-functions.html

Either way, I've posted something about this in the AWS forums too: https://forums.aws.amazon.com/thread.jspa?messageID=728647&

Hope that helps!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.