1

I have a table in SQL Server 2017 which has many rows and that table was migrated to Postgres 10.5 along with data (my colleagues did it using Talend tool).

I want to compare if the data is correct after migration. I want to compare the values in a column in SQL Server vs Postgres.

I could try reading the columns into a Numpy series items from SQL server and Postgres and compare both.

But both the DBs are not in my local machine. They're hosted on a server that I need to access from the network which means the data retrieval is going to take much time.

Instead, I want to do something like this.

Perform sha256 or md5 hash on the column values which are ordered_by primary_key and compare the hash values from both databases which means I don't need to retrieve the results from the database to my local for comparison.

That function or something should return the same value for the hash if the column has exact same values.

I'm not even sure if it's possible or is there any better way to do it.

Can someone please point me in some direction.

4
  • @@Sukumar Rdjf I think...you have to use SQL Delta...for Comparing the Database.. Commented Nov 5, 2019 at 10:23
  • It looks like there is no support to compare SQL Server vs Postgres. Commented Nov 5, 2019 at 10:27
  • Download both tables as CSV files and use windiff to compare them Commented Nov 5, 2019 at 11:31
  • Downloading the tables to my local is taking ages due to network constraints and VPN restrictions and ssh tunneling. I don't think that's possible. Commented Nov 5, 2019 at 11:41

1 Answer 1

2

If an FDW isn't going to work out for you, maybe the hash comparison is a good idea. MD5 is probably a good idea, only because you ought to get consistent results from different software.

Obviously, you'll need the columns to be in the same order in the two databases for the hash comparison to work. If the layouts are different, you can create a view in Postgres to match the column order in SQL Server.

Once you've got tables/views to compare, there's a shortcut to the hashing on the Postgres side. Imagine a table named facility:

SELECT MD5(facility::text) FROM facility;

If that's not obvious, here's what's going in there. Postgres has the ability to case any compound type to text. Like:

select your_table_here::text from your_table_here

The result is like this example:

(2be4026d-be29-aa4a-a536-de1d7124d92d,2200d1da-73e7-419c-9e4c-efe020834e6f,"Powder Blue",Central,f)

Notice the (parens) around the result. You'll need to take that into account when generating the hash on the SQL Server side. This pithy piece of code strips the parens:

SELECT MD5(substring(facility::text, 2, length(facility::text))) FROM facility;

Alternatively, you can concatenate columns as strings manually, and hash that. Chances are, you'll need to do that, or use a view, if you've got ID or timestamp fields that automatically changed during the import.

The :: casting operator can also cast a row to another type, if you've got a conversion in place. And where I've listed a table above, you can use a view just as well.

On the SQL Server side, I have no clue. HASHBYTES?

Sign up to request clarification or add additional context in comments.

4 Comments

I tried md5 function on Postgres and it's working fine. The only thing is I'm not sure how to do the same thing on SQL Server
Yes. That's what I looked at. But they gave info tutorial for a single column. Although that's what I wanted earlier, I guess it would be much better to compare the hash of the entire row of the table rather than just a single column. Can I pass * in place of column c1 in the link ?
I'd also recommend that you pull the PK and the row hash for comparison. If you get those down locally, you can load them into a scratch database and find mismatches very easily. PK's are going to work our a lot more clearly than counting on row order. And if one row is missing, the whole idea of use row order is toast. I've done this sort of thing to verify multi-million row syncs, and it worked fine. But only because I had the PKs to join against. Plus, I could then take my 200 problem IDs and find them in the 6M rows to figure out what was going on. Trying to do that by row order....ugh.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.