0

I have a table to export data from SQL Server to PostgreSQL.

Steps I followed:

Step 1: Export data from SQL Server:

Source: SQL Server Table
Destination: Flat file Destination 
Table Or Query to copy: Query 

Query:

SELECT 
    COALESCE(convert(varchar(max),id),'NULL') + '|'
    +COALESCE(convert(varchar(max),Name),'NULL') + '|'
    COALESCE(convert(varchar(max),EDate,121),'NULL') AS A
FROM tbl_Employee;

File Name: file.txt

Step 2: Copy to PostgreSQL.

Command:

\COPY tbl_employee FROM '$FilePath\file.txt' DELIMITER '|' NULL AS 'NULL' ENCODING 'LATIN1'

Getting Following error message:

ERROR:  invalid byte sequence for encoding "UTF8": 0xc1 0x20
4
  • Well, as it says, you've got a byte sequence that isn't valid UTF8. I would guess that the original source database is not in UTF8. To minimise transition error, you're gonna want to to configure Postgres's back end to use whatever encoding the original database had. Commented Jul 24, 2017 at 6:30
  • That error message is surprising. Which PostgreSQL version is this? What do you get for SHOW client_encoding; and SHOW server_encoding;? Commented Jul 24, 2017 at 7:31
  • Both encoding are 'UTF8'. Commented Jul 24, 2017 at 7:49
  • @LaurenzAlbe: I don't think server or client encoding are the key here. It's the file or its encoding. Commented Jul 27, 2017 at 15:01

1 Answer 1

1

You tell Postgres the source would be encoded as LATIN1:

\copy ... ENCODING 'LATIN1'

But that's either not the case or the file is damaged. Else we would not see the error message. What is the true encoding of '$FilePath\file.txt'?

The current client_encoding is not relevant for this since, quoting the manual on COPY:

ENCODING

Specifies that the file is encoded in the encoding_name. If this option is omitted, the current client encoding is used.

(\copy is jut a wrapper for SQL COPY in psql.)

And your server_encoding is largely irrelevant, too - as long as Postgres can use a built-in conversion and the target encoding contains all characters of the source encoding - which is the case for LATIN1 -> UTF8: iso_8859_1_to_utf8.

So the remaining source of error is your file, which is almost certainly not valid LATIN1.

Sign up to request clarification or add additional context in comments.

2 Comments

I realize that client_encoding shouln't come into play here, but I am still clueless. If the file contained the byte sequence C1 20, then it should be converted to 'Á ' without error. Sowhat happens must be that iso8859_1_to_utf8 turns some single byte into C1 20, but I cannot see how that could happen... That function doesn't throw any such error messages.
@LaurenzAlbe: Good points. I am not entirely sure how the error is generated exactly. I guess the source file has illegal bytes for LATIN1 (damaged file or wrong encoding), the conversion does its job in good faith and produces illegal UTF8 for the illegal input. To generate a more helpful error message, Postgres might test (more thoroughly) whether the source is legal and not wait with the test until after the conversion - which might be more expensive, which is why that isn't implemented. Or the OP made a c/p mistake in the question. Just speculating here ...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.