PostgreSQL: Export data from SQL Server 2008 R2 to PostgreSQL 9.5

Question

I have a table to export data from SQL Server to PostgreSQL.

Steps I followed:

Step 1: Export data from SQL Server:

Source: SQL Server Table
Destination: Flat file Destination 
Table Or Query to copy: Query

Query:

SELECT 
    COALESCE(convert(varchar(max),id),'NULL') + '|'
    +COALESCE(convert(varchar(max),Name),'NULL') + '|'
    COALESCE(convert(varchar(max),EDate,121),'NULL') AS A
FROM tbl_Employee;

File Name: file.txt

Step 2: Copy to PostgreSQL.

Command:

\COPY tbl_employee FROM '$FilePath\file.txt' DELIMITER '|' NULL AS 'NULL' ENCODING 'LATIN1'

Getting Following error message:

ERROR:  invalid byte sequence for encoding "UTF8": 0xc1 0x20

Well, as it says, you've got a byte sequence that isn't valid UTF8. I would guess that the original source database is not in UTF8. To minimise transition error, you're gonna want to to configure Postgres's back end to use whatever encoding the original database had. — Benny Mackney
– Benny Mackney, Commented Jul 24, 2017 at 6:30
That error message is surprising. Which PostgreSQL version is this? What do you get for SHOW client_encoding; and SHOW server_encoding;? — Laurenz Albe
– Laurenz Albe, Commented Jul 24, 2017 at 7:31
@LaurenzAlbe: I don't think server or client encoding are the key here. It's the file or its encoding. — Erwin Brandstetter
– Erwin Brandstetter, Commented Jul 27, 2017 at 15:01

Erwin Brandstetter · Accepted Answer · 2017-07-27 15:21:43Z

1

You tell Postgres the source would be encoded as LATIN1:

\copy ... ENCODING 'LATIN1'

But that's either not the case or the file is damaged. Else we would not see the error message. What is the true encoding of '$FilePath\file.txt'?

The current client_encoding is not relevant for this since, quoting the manual on COPY:

ENCODING

Specifies that the file is encoded in the encoding_name. If this option is omitted, the current client encoding is used.

(\copy is jut a wrapper for SQL COPY in psql.)

And your server_encoding is largely irrelevant, too - as long as Postgres can use a built-in conversion and the target encoding contains all characters of the source encoding - which is the case for LATIN1 -> UTF8: iso_8859_1_to_utf8.

So the remaining source of error is your file, which is almost certainly not valid LATIN1.

edited Jul 27, 2017 at 15:21

answered Jul 27, 2017 at 4:16

Erwin Brandstetter

668k160 gold badges1.2k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Laurenz Albe Over a year ago

I realize that client_encoding shouln't come into play here, but I am still clueless. If the file contained the byte sequence C1 20, then it should be converted to 'Á ' without error. Sowhat happens must be that iso8859_1_to_utf8 turns some single byte into C1 20, but I cannot see how that could happen... That function doesn't throw any such error messages.

Erwin Brandstetter Over a year ago

@LaurenzAlbe: Good points. I am not entirely sure how the error is generated exactly. I guess the source file has illegal bytes for LATIN1 (damaged file or wrong encoding), the conversion does its job in good faith and produces illegal UTF8 for the illegal input. To generate a more helpful error message, Postgres might test (more thoroughly) whether the source is legal and not wait with the test until after the conversion - which might be more expensive, which is why that isn't implemented. Or the OP made a c/p mistake in the question. Just speculating here ...

Collectives™ on Stack Overflow

PostgreSQL: Export data from SQL Server 2008 R2 to PostgreSQL 9.5

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related