We have some medium to largish tables (i.e. hundreds-of-thousdands of rows) of data that we wish to allow users to effectively "fork" similar to Git, and potentially collaborate on over time. We want them to be able to "fork" them multiple times, and make edits to different forks and compare various aggregate data between two forks.
Basically this data starts out as a bunch of read-only tables and we want to give users the ability to have a view of the table that contains their edits. The challenge is that we'd also like to periodically query rows in this table (i.e. show only rows where column3 = 'left'), and possibly even join against another table based on specific columns. (i.e. an INNER JOIN where user_table.column3 = other_table.column10) - i.e. we'd like to be able to treat these tables like fully materialized tables for relational operations.
The dumbest solution (which we do today) is simply to make full copies of the tables, but the challenge is that this is expensive at least in our current incarnation: we are using PostgreSQL, and these copy operations can take 2-20 minutes. We'd like this to be a real-time operation, like something with copy-on-write behavior.
We do record the changes the users make (i.e a log of changes) so we can eventually apply them to the "original" table but it would be nice to have a pattern, or in an ideal world, a library or storage layer, that just does this for us.
We happen to use PostgreSQL and Python today but I'm open to NoSQL systems here, as I can imagine this could result in some pretty nasty SQL if this is generalized enough. Plus we're willing to sacrifice some relational capability in order to achieve the above. Are there known patterns and/or implementations in this space? Either in PostgreSQL, or in other storage systems? Turns out this is a really hard thing to google for.