Usage of PostgreSQL JSON columns as a flexible table schema

Question

I'm toying with the idea of structuring my PostgreSQL tables in this way:

Id: uuid
Structure: JSON
Some_FK: uuid

In this case the Structure column is a JSON document containing fields that would otherwise be additional columns on the table. At this point I would basically be using the RDBMS to generate and manage IDs and relationships, while getting the schema flexibility of a document-store. In this usage the documents themselves aren't linked together (which is difficult to manage), the documents are simply an extension of the row, and the rows are linked together.

Has anyone tried this sort of thing before, or am I crazy to attempt to use the feature this way?

I'm not sure this question is a good fit for this site (too open to opinion and discussion rather than concrete answers) but my gut reaction is this: it sounds like you'd basically end up reimplementing a document store like MongoDB, but which happens to use Postgres as the low-level implementation. That might be an interesting project, but I'm not sure what the advantages would be over an existing technology like MongoDB. — IMSoP
– IMSoP, Commented Nov 16, 2013 at 22:04
I have set up a similar system at work, minus the UUIDs. It is currently in Postgresql 9.2 and I have indexed the JSON keys using PLV8, but I plan to port that over to 9.3 and use the built-in JSON functions instead (built in to Postgresql 9.3 I mean). It works well, though the database in only about 900GB. The production version is expected to be several TB's. — bma
– bma, Commented Nov 16, 2013 at 22:07
@IMSoP point taken about the question. I had the same thought, but figured where else would I be able to get access to so many experts? Our reasons for not choosing a purely schemaless database like MongoDB is that our data has fairly decent structure, but we like the flexibility the of schemaless storage model. Basically, as soon as we realized we needed to maintain relationships and query on them, we knew a document store wouldn't be the best fit for us. — Matt Baker
– Matt Baker, Commented Nov 16, 2013 at 22:17
@bma Have you found the flexibility of the schemaless structure to be helpful alongside the relational aspects for maintaining relationships? Mainly in terms of 1) ease of change and 2) maintainability — Matt Baker
– Matt Baker, Commented Nov 16, 2013 at 22:19
Your statements seem kind of contradictory to me: if your data has a decent structure, why not just design a schema for it? Are you worried about change in the structure over time? Perhaps what you actually need is a middleware or ORM layer that abstracts the queries in a way that makes schema changes easier (which you'll need anyway to handle the JSON data). Or is there some other sense in which you wouldn't "trust" a traditional schema? — IMSoP
– IMSoP, Commented Nov 16, 2013 at 22:24

IMSoP · Accepted Answer · 2013-11-16 22:33:37Z

3

The more I think about it, the less sense this makes to me: you get none of the advantages of a structured database, powerful SQL queries, data integrity constraints, etc; but you have all the cost of the DBMS sitting there basically unused, and have to write all the tools for manipulating the data yourself.

If there were no systems available for schemaless document stores, this might be a way of writing a prototype for one, but there are - why build a MongoDB clone on top of Postgres when you could just use MongoDB? Perhaps as an abstract project, some kind of hybrid might make sense, but I'd have thought beyond prototyping it would make sense to fork Postgres and rip out the SQL rather than having all that complexity lying unused.

On a practical level, I'm not sure how you're intending foreign keys to work; it sounds like columns which happen to be foreign keys would remain as real columns, but any other columns would be mashed into the JSON document. That would mean that to retrieve the data, you'd still need to hand-craft the SQL with JOIN statements, but then have an additional layer as well to manipulate the fields inside the JSON (e.g. to filter by them). Or perhaps you would hard-code the JSON manipulation into functions in the SQL expression, in which case you might as well just have a normal schema.

If your primary concern with a traditional schema is the cost of changing them once running, perhaps you should be more concerned about the middleware or ORM layer which you need to isolate the schema from the rest of your application. If you have a "schemaless" structure, each row can effectively have a different schema (structure inside the JSON blob) so the application will need to cope with all past versions of the structure for an item type. But if you have multiple tables with defined foreign keys, the wrapper will also need to isolate changes to those, such as tables being created or new relationships being defined, which is basically what you'd need for a fully Relational schema.

edited Nov 16, 2013 at 22:33

answered Nov 16, 2013 at 22:21

IMSoP

99.7k18 gold badges135 silver badges183 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Matt Baker Over a year ago

Thank you, this has been very helpful with challenging our thinking about this.

Matt Baker Over a year ago

And your response is a good description of when/why to use each type system, so +1'd and Accepted.

gertas · Accepted Answer · 2014-01-27 15:34:00Z

3

I had similar dillema as you. I was all in flexible schema hype but finally I am convinced that in too many cases transactions and integrity checking can't be thrown away once application matures.

In my opinion when you need transactions and want to store loose hierarchical data without introducing new tables or separate DBs the hybrid relation+JSON is the way to go. For example in my application I use JSON columns for:

file metadata records with EXIF or ZIP directory structure dump
audit trail and version dump of the row (number of operations saved is small)
user error history related with record
statistics and precomputed aggregates
generally any content which is not needed today but it would be hard to get that data in the future, which is something better than unstructured log file.

Sure it could be done in MongoDB or any other "NoSQL" but why? Now I get:

one place to store data, one protocol, API etc.
full consistency,
transactions,
cascaded deletes without orphaned documents
ad hoc analytic queries, with PostgreSQL 9.3 you can easily query with json functions, example I want all users from France with >80% pictures made with Pentax camera this year,
I am going to deploy Postgres-XC soon to achieve scalable master-master,
with PLV8 extension I can do anything with my data I can imagine.

answered Jan 27, 2014 at 15:34

gertas

17.2k1 gold badge79 silver badges58 bronze badges

2 Comments

Matt Baker Over a year ago

We ended up going with Postgres for all the reasons you stated above. If we need to do JSON stuff in the future we can, but we don't go completely without any referential protections. We also use Redis for storing certain transitory data that only needs to be loosely tied to entities in our database.

reptilicus Over a year ago

I agree with this answer. After trying to use mongo for a while, we realized that it was impossible to not have the relational goodness of postgres. We do utilize JSON columns in several places, which would be god awful to do in standard tables because of the complexity of the underlying JSON structure. As far as I see it, with the mature support of JSON in postgres, there is not much need for mongo. And a mixed architecture of mongo and postgres is also tricky.

Collectives™ on Stack Overflow

Usage of PostgreSQL JSON columns as a flexible table schema

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related