1

I got a table where the data may contain null between the characters. As I have already defined the table as VARCHAR, it throws me an error

org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00

There should be a way where I can insert a null based string in postgres. This is the sample insert that has failed while inserting onto postgres

private void postGrestest() throws ClassNotFoundException, SQLException
{
    Class.forName("org.postgresql.Driver");

    String dropStmt = "DROP TABLE PUBLIC.TEST";
    String createStmt = "CREATE TABLE PUBLIC.TEST(COL1 VARCHAR(50), COL2 BOOLEAN)";
    String insertStmt = "INSERT INTO PUBLIC.TEST VALUES (?, ?)";
    try (Connection connection = DriverManager.getConnection(
            "jdbc:postgresql://url:5432/objectserver?stringtype=unspecified",
            "username", "password");
            Statement stmt = connection.createStatement();
            PreparedStatement ps = connection.prepareStatement(insertStmt);)
    {
        stmt.execute(dropStmt);
        stmt.execute(createStmt);
        Random r = new Random();
        for (int i = 0; i < 100; i++)
        {
            Object str = "Test" + i;
            str = ((String) str).replace('s', '\0');
            logger.info("Inserting " + str);
            // str = ((String) str).replace("\0", "");
            ps.setObject(1, str);
            Object obj = String.valueOf(r.nextBoolean());
            ps.setObject(2, obj);
            ps.executeUpdate();
        }
    }
}

Are there any considerations before dealing with this type of data? This data is a string based one where the source may contain data containing null between them. This is handled well on a different database instance SQL Server using NVARCHAR.

1 Answer 1

4

You can't include a null in a string in PostgreSQL. From the documentation:

The character with the code zero cannot be in a string constant.

Java uses a slightly modified Unicode scheme where U+0000 can be encoded as 0xC0 0x80, a two-byte encoding. You might replace these values in the string rather than a binary null. PostgreSQL will gladly ingest it.

Sign up to request clarification or add additional context in comments.

3 Comments

I understand. Can this be replaced on the backend with any other delimiter or a codepoint that will not be used?
Replacing \0 with \uC080 does not help as it represents 삀. It writes as 'Te삀t0'. I wanted the backend user to run the query where it should not be represented in the table. Will WIN1252 be helpful in this case?
In utf-8 U+C080 expands to 0xEC 0x82 0x80. You should not insert that code point directly but just the two bytes that make up the code point.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.