4

I am evaluating whether Postgres would be suited to parsing a number of quite complicated JSON documents to extract and semantically match entities, eventually filling a relational schema with high integrity.

I have found the use of jsonpath to be very helpful working with these documents and found this article which suggested that Postgres 11 would have support of sorts. However, the docs do not mention this at all

My question is then will support be forthcoming? Also, is this kind of processing suited to Postgres at all? (possibly use Lucene, MongoDb for the parsing and matching then importing into Postgres relational tables somehow?)

An example of the data could be:

```

{
    "event_classes": [
        {
            "name": "American Football", 
            "url": "/sportsapi/v2/american-football", 
            "id": 27
        }, 
        {
            "name": "Athletics", 
            "url": "/sportsapi/v2/athletics", 
            "id": 48
        }, 
        {
            "name": "Aussie Rules", 
            "url": "/sportsapi/v2/aussie-rules", 
            "id": 10000062
        }, 
        {
            "name": "Badminton", 
            "url": "/sportsapi/v2/badminton", 
            "id": 10000069
        }, 
        {
            "name": "Baseball", 
            "url": "/sportsapi/v2/baseball", 
            "id": 5000026
        }
    ]
}

```

6
  • 1
    I don't know why you would need jsonpath, can you explain your use case a little? You can already work with json using jsonb and make it searchable with gin. Commented Aug 31, 2018 at 13:26
  • 1
    well, the data itself is well structured but very nested and i would like to extract entities/objects by position and then run logic against each (pgsql etc). This would be the "parsing" phase but the pgsql would recursively look against an existing table, effectively doing "semantic matching" before inserting. From a top level, this is the guts of a comparison engine, whereby a number of feeds are to be processed which contain semantically the same entities but both the format of the data and the textual attributes of the entities vary widely. Can i ask how GIN would be of service here? Commented Aug 31, 2018 at 14:22
  • 1
    Again the use case is a little unclear but why can you not do that with a stored procedure via a trigger? If you know the structure, extract all your values into variables and work on those. If you need to cross-check multiple nested JSON objects then create a temp table and insert every instance, then you can run normal SQL queries against the new shape of your data. Commented Aug 31, 2018 at 14:27
  • have updated the description Commented Aug 31, 2018 at 14:48
  • 1
    What kind of query do you want to run on that data? It would be very easy to turn that into a "relational table" using the current JSON functions - I don't see the need for jsonpath here - but without the actual queries you need this is impossible to answer Commented Aug 31, 2018 at 15:26

2 Answers 2

3

SQL/JSON support didn't make it in v11.

It is available from PostgreSQL v12 on.

Your use case is a little vague, but I think that PostgreSQL would be well suited for this kind of processing, particularly if the data should end up in a relational schema.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes I believe postgresql should cover all my needs
2

As a general point on SQL/Path will be a more terse method to query a JSONB data-structure. It will compile down to traditional methods of querying JSONB. That makes it a parser feature, providing a standard syntax.

Imho, that standard syntax is substantially better and gives room for future optimizations, however any query on JSON can be done with the PostgreSQL operators you've linked to, it's just not always pretty.

Finding out if that array contains {"foo":2} is simple.

WITH t(jsonb) AS ( VALUES ('[{"foo":2, "qux":42},{"bar":2},{"baz":4}]'::jsonb) )
SELECT *
FROM t
WHERE jsonb @> '[{"foo":2}]';

However, it's substantially harder to do get the value of qux given the above.

WITH t(jsonb) AS ( VALUES ('[{"foo":2, "qux":42},{"bar":2},{"baz":4}]'::jsonb) )
SELECT e->'qux'
FROM t
CROSS JOIN LATERAL jsonb_array_elements(jsonb) AS a(e)
WHERE t.jsonb @> '[{"foo":2}]'
  AND e @> '{"foo":2}';

But, that's not the end of the world. That's actually a really nice SQL syntax. It's just not JavaScript. With JSON PATH you'll be able to do something,

SELECT t.json _ '$.[@.foo == 2].qux'
FROM t
WHERE t.json _ '$.[@.foo == 2]';

Where _ is some kind of JSONPATH operator. As an aside, you can always create an actual JavaScript stored procedure on the server and run it with node. It's really dirt simple with pl/v8.

2 Comments

Yes it is the syntax of the standard JSON type that worries me a little and whether it can address my needs. Will be sure to use jsonb
JavaScript stored procedure here might be hugely helpful, will investigate further. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.