39

I've got a big database with analytics data written in JSON.

I want to filter out rows with incorrect data:

  • invalid JSON (some rows has something like that: '{"hello": "world'
  • some attributes is not array so it would take '{"products": [1,2,3]}' and will leave out the '{"products": 1}'

I want to do something like:

select * 
from analytics 
where (is_correct_json(json::json)) 
and (is_array(json::json->>'products'))

How can I achieve that?

3
  • 7
    If the column is defined as json or jsonb you can't store invalid json. Commented May 12, 2015 at 10:02
  • I am still getting that: ERROR: invalid input syntax for type json Detail: The input string ended unexpectedly. Where: JSON data, line 1: [{"product_id":1,"color" Commented May 12, 2015 at 10:03
  • You were right - the column is VARCHAR type but I convert it to JSON in the select. Commented May 12, 2015 at 10:05

2 Answers 2

45

This is another good example why choosing the appropriate data type right from the start helps later ;)

There is no built-in function to check if a given text is valid JSON. You can however write your own:

create or replace function is_valid_json(p_json text)
  returns boolean
as
$$
begin
  return (p_json::json is not null);
exception 
  when others then
     return false;  
end;
$$
language plpgsql
immutable;

Caution: due to the exception handling this is not going to be fast. If you call that on many invalid values this is going to slow down your select massively.

However both '{"products": 1}' and '{"products": [1,2,3]}' are valid JSON documents. The fact that the former is invalid is based on your application logic, not on the JSON syntax.

To verify that you would need a similar function, that traps errors when calling json_array_length()

create or replace function is_valid_json_array(p_json text, p_element text)
  returns boolean
as
$$
begin
  return json_array_length( p_json::json -> p_element) >= 0;
exception 
  when others then
     return false;  
end;
$$
language plpgsql
immutable;
Sign up to request clarification or add additional context in comments.

4 Comments

thanks - this is perfect example, but this is legacy database and converting it now would be a big problem :)
this function needs to become core function
@user528025 Postgres is working on providing standard SQL functions for this. It was planned to integrate this in the upcoming Postgres 15, but for quality reasons this didn't make it into 15. So unfortunately we'll have to wait until Postgres 16 for the standard compliant IS JSON condition.
@a_horse_with_no_name Many, many thanks. Your is_valid_json function just saved me of painful test on a big table.
29

Postgres 16 or newer

Postgres 16 (release date late 2023) finally implements the SQL/JSON standard predicate IS JSON for this purpose exactly. The manual:

This predicate tests whether expression can be parsed as JSON, possibly of a specified type. If SCALAR or ARRAY or OBJECT is specified, the test is whether or not the JSON is of that particular type. If WITH UNIQUE KEYS is specified, then any object in the expression is also tested to see if it has duplicate keys.

To verify the string is a valid JSON literal:

... WHERE string_column IS JSON

To verify the string is a valid JSON array literal:

... WHERE string_column IS JSON ARRAY

(But note that '{"products": [1,2,3]}' is not a JSON array, but a JSON object.)

Postgres 15 or older

This is an improved version of a_horse_with_no_name's function, still using an expensive subtransaction to trap the invalid_text_representation error. But it's supposed to be faster and safer:

CREATE OR REPLACE FUNCTION f_is_json(_txt text)
  RETURNS bool
  LANGUAGE plpgsql IMMUTABLE STRICT AS
$func$
BEGIN
   RETURN _txt::json IS NOT NULL;
EXCEPTION
   WHEN SQLSTATE '22P02' THEN  -- invalid_text_representation
      RETURN false;
END
$func$;

COMMENT ON FUNCTION f_is_json(text) IS 'Test if input text is valid JSON.
Returns true, false, or NULL on NULL input.'

Major differences

  • Mark the function STRICT to get faster results for null input.

  • The first item also makes it functionally different, returning null on null input - which is literally what the STRICT modifier does, RETURNS NULL ON NULL INPUT being alternative syntax for the same. This seems more reasonable, as null is valid for the types json and jsonb (or any data type in Postgres for that matter), and we can test for true / false / null as needed.

  • Sadly, we cannot mark the function PARALLEL SAFE in Postgres 9.6 or later. The EXCEPTION part makes Postgres start a subtransaction when the function is entered, and that's not compatible with parallel execution. Related:

  • This function only traps "invalid text representation" and not any other errors - even if other errors seem hardly possible.

"f_is_json" is just my naming convention to prefix my custom functions with "f_" to clearly distinguish from built-in functions.

Quick test:

SELECT f_is_json('[]')  AS expected_t
     , f_is_json('sdf') AS expected_f
     , f_is_json(null)  AS expected_n;

fiddle

2 Comments

Note: I had to remove the PARALLEL SAFE part to make it work as on my Postgres 12 I was getting "cannot start subtransactions during a parallel operation".
@stoper: Thanks for pointing out! I fixed the code of the old function accordingly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.