1

I have a Redshift table that contains columns with json objects. I am running into json parser failures while trying to execute queries on this table that applies specific filters on the json object content. While I am able to use the json_extract_path_text() in the select statements, the same fails when used in where clause.

Following is the error I see: Amazon Invalid operation: JSON parsing error;

When I look at the STL_ERROR table for more details, this is what I see in the error details: errcode: 8001 context: JSON parsing error error: invalid json object null

Following is an example of the content in one such json column:

{"att1":"att1_val","att2":"att2_val","att3":[{"att3_sub1_1":"att3_sub1_1_val","att3_sub1_2":"att3_sub1_2_val"},{"att3_sub2_1":"att3_sub2_1_val","att3_sub2_2":"att3_sub2_2_val"}],"att4":"att4_val","att5":"att5_val"}

Now when I run the following query, it executes without any issues:

select
json_extract_path_text(col_with_json_obj,'att4') as required_val
from table_with_json_data;

Now when I use the json_extract_path_text() in the where clause it fails with the above error:

select
json_extract_path_text(col_with_json_obj,'att4') as required_val
from table_with_json_data
where json_extract_path_text(col_with_json_obj,'att4') = 'att4_val';

Is there anything that I am using incorrectly or missing here?

P.S: I have another table with similar schema and the same queries run just fine on that. The only difference between the two tables is the way the data is loaded - one uses jsonpaths file in the copy options and the other uses json 'auto'.

2 Answers 2

1

This is the error you would receive if table_with_json_data contained even a single row in which the value of col_with_json_obj was the four-character string "null".

To avoid errors like this in general I'd recommend creating a Redshift UDF for validating JSON. The is_json() method described at http://discourse.snowplowanalytics.com/t/more-robust-json-parsing-in-redshift-with-python-udfs/197 has worked well for me:

create or replace function is_json(j varchar(max))
  returns boolean
  stable as $$
    import json
    try:
      json_object = json.loads(j)
    except ValueError, e:
      return False
    return True
  $$ language plpythonu;

Then you could add a and where is_json(col_with_json_obj) clause to your query and this class of errors can be avoided entirely.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Jason.. Yeah, I noticed that few of the rows had the string value "null" and had to explicitly filter them out.. However the thing that I still don't understand is why it works in "select" statement and not in where clause. To be precise, the conditions that I have in my where clause should filter out rows that had the string "null" for this column.
@Hercules I've had the same confusion and frustration. There's a question on the AWS Redshift forum you can follow, but there is no response yet: forums.aws.amazon.com/thread.jspa?threadID=232468
0

For me it worked when I used case when statement first, created a new column and then used that new column to filter out for the observations that I wanted to keep.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.