Synchronize data from Oracle to PostgreSQL

Question

We would like to synchronize data (insert, update) from Oracle (11g) to PostgreSQL (10). Our approach was the following:

A trigger on the table in Oracle updates a column with nextval from a sequence before insert and update.
PostgreSQL knows the last sequence number processed and fetches the rows from Oracle > lastSequenceNumberFetched.

We now have the following problem:

Session 1 in Oracle inserts a row, sequence number (let's say 45) is written but no COMMIT is done in Oracle.
Session 2 in Oracle inserts a row, sequence number is written (let's say 49 (because sequences in Oracle can have gaps)) and a COMMIT is done in Oracle.
Session in PostgreSQL fetches rows from Oracle with sequenceNumber > 44 (because the lastSequenceNumberFetched is 44) and gets the row with sequenceNumber 49. So this is the new lastSequenceNumberFetched.
Session 1 in Oracle makes a commit.
Session in PostgreSQL fetches rows from Oracle with sequenceNumber > 49. Problem is that the row with sequenceNumber 45 is never fetched.

Are there any better approaches for our use case avoiding our problem with missing data?

Did you consider not replicating at all and using a foreign data wrapper that makes the Oracle table available in Postgres? — user330315
– user330315, Commented Jun 20, 2018 at 13:58

Maxim Borunov · Accepted Answer · 2018-06-20 13:54:02Z

In case you don't have delete operations in your tables and the tables are not very big then I suggest to use Oracle System Change Number (SCN) on the row level which is returned by the pseudo column ORA_ROWSCN (link). This is the commit time presented by number. By default the SCN is tracked for the data block, but you can enable tracking on the row level (keyword rowdependencies). So you have to recreate your table with this keyword. At the sync procedure launch you get the current scn by the function call dbms_flashback.get_system_change_number, then scan all tables where ora_rowscn between _last_scn_value_ and _current_scn_value_. The disadvantage is that this pseudo columns is not indexed, so you will have full table scans, which is slow for big tables.
If you use delete statements then you have to track the records which were deleted. For this purpose you can use one log table having the following columns: table_name, table_id_value, operation (insert/update/delete). Table is filled by the trigger on base tables. So for your case when session 1 commits data in base table - then you have the record in log table to process. And you don't see it until the session commits. So no issues with sequence numbers that you described.

Hope that helps.

kayakpim · Accepted Answer · 2018-06-20 13:42:56Z

Is this purely a data project or do you have some client here. If you do have a middle tier you could use an ORM to abstract some of this and do writes to both. Do you care whether the sequences are the same? It would be possible to do something like collect all the data to synchronize since a particular timestamp (every table would have to have a UTC timestamp) and then take a hash of all the data and compare with what is in Postgres.

It might be useful to have some more of your requirements for the synchronization of data and the reasoning behind this e.g.

Do the keys need to be the same against both environments? Why? Who views the data, is the same consumer looking at both sources. Why wouldn't you just use an ORM to target only one db why do you need oracle and postgres?

marc_s · Accepted Answer · 2019-07-05 16:06:59Z

I have seen a similar setup. An application on Postgres mostly for reporting and other secondary tasks while main app was on Oracle.

Some of the main app tables are cached in Postgres for convenience. But this setup brings in the sync problem.

The compromise solution was a mix of incremental sequence-based sync during daytime and full table copy overnight

Regarding other solutions proposed here:

Postgres fdw is slow for complex queries and it puts extra load on foreign db especially when where clause refer to both local and foreign tables.
The same query will run much faster if foreign table is cached in postgres.
Incremental/differential sync using sequence numbers -tried this and works acceptable for small tables, but the nightmare starts with child relations maybe an orm can help here
The ideal solution in my opinion would probably be to stream Oracle changes to Postgres or intermediary process that replicates changes to Postgres

I have no clue about how to do this as I understood it requires Oracle golden gate app (+ licence)

Collectives™ on Stack Overflow

Synchronize data from Oracle to PostgreSQL

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related