3

I try to load some data from a table containing json rows.
There is one field that can contain special chars as \t and \r, and I want to keep them as is in the new table.

Here is my file:

{"text_sample": "this is a\tsimple test", "number_sample": 4}

Here is what I do:

Drop table if exists temp_json;
Drop table if exists test;
create temporary table temp_json (values text);

copy temp_json from '/path/to/file';

create table test as (select 
        (values->>'text_sample') as text_sample,
        (values->>'number_sample') as number_sample
        from   (
           select replace(values,'\','\\')::json as values
           from   temp_json
       ) a);

I keep getting this error:

ERROR:  invalid input syntax for type json
DETAIL:  Character with value 0x09 must be escaped.
CONTEXT:  JSON data, line 1: ...g] Objection to PDDRP Mediation (was Re: Call for...

How do I need to escape those characters?
Thanks a lot

3
  • Post sample data containing the offending rows. Commented May 8, 2017 at 19:37
  • I updated with all the needed details Commented May 8, 2017 at 19:44
  • 1
    copy temp_json from program 'sed -e ''s/\\/\\\\/g'' /path/to/file';? Commented May 8, 2017 at 21:02

3 Answers 3

5

As mentioned in Andrew Dunstan's PostgreSQL and Technical blog

In text mode, COPY will be simply defeated by the presence of a backslash in the JSON. So, for example, any field that contains an embedded double quote mark, or an embedded newline, or anything else that needs escaping according to the JSON spec, will cause failure. And in text mode you have very little control over how it works - you can't, for example, specify a different ESCAPE character. So text mode simply won't work.

so we have to turn around to the CSV format mode.

copy the_table(jsonfield) 
from '/path/to/jsondata' 
csv quote e'\x01' delimiter e'\x02';

In the official document sql-copy, some Parameters list here:

COPY table_name [ ( column_name [, ...] ) ]
    FROM { 'filename' | PROGRAM 'command' | STDIN }
    [ [ WITH ] ( option [, ...] ) ]
    [ WHERE condition ]

where option can be one of:

    FORMAT format_name
    FREEZE [ boolean ]
    DELIMITER 'delimiter_character'
    NULL 'null_string'
    HEADER [ boolean ]
    QUOTE 'quote_character'
    ESCAPE 'escape_character'
    FORCE_QUOTE { ( column_name [, ...] ) | * }
    FORCE_NOT_NULL ( column_name [, ...] )
    FORCE_NULL ( column_name [, ...] )
    ENCODING 'encoding_name'
  • FORMAT
    • Selects the data format to be read or written: text, csv (Comma Separated Values), or binary. The default is text.
  • QUOTE
    • Specifies the quoting character to be used when a data value is quoted. The default is double-quote. This must be a single one-byte character. This option is allowed only when using CSV format.
  • DELIMITER
    • Specifies the character that separates columns within each row (line) of the file. The default is a tab character in text format, a comma in CSV format. This must be a single one-byte character. This option is not allowed when using binary format.
  • NULL
    • Specifies the string that represents a null value. The default is \N (backslash-N) in text format, and an unquoted empty string in CSV format. You might prefer an empty string even in text format for cases where you don't want to distinguish nulls from empty strings. This option is not allowed when using binary format.
  • HEADER
    • Specifies that the file contains a header line with the names of each column in the file. On output, the first line contains the column names from the table, and on input, the first line is ignored. This option is allowed only when using CSV format.
Sign up to request clarification or add additional context in comments.

2 Comments

Hi, I am very new to this field. what are "e'\x01'" and "e'\x02'" referring to?
The strings \x01 and \x02 are examples of byte literals in Python. They represent the characters with the ASCII values 1 and 2, respectively. These characters are non-printable and are often used for control purposes in various protocols... I use them because we don't use them in nomal situation. You can chose other charcaters you like instead.
3

Copy the file as csv with a different quoting character and delimiter:

drop table if exists test;
create table test (values jsonb);
\copy test from '/path/to/file.csv' with (format csv, quote '|', delimiter ';');

select values ->> 'text_sample', values ->> 'number_sample'
from test;
          ?column?           | ?column? 
-----------------------------+----------
 this is a       simple test | 4

1 Comment

This isn't a solution it's a workaround for the specific case and data set. What if the data contains the values you've suggested using for quote and delimiter?
0

cast json as text, instead of getting text value from json. Eg:

t=# with j as (
        select '{"text_sample": "this is a\tsimple test", "number_sample": 4}'::json v
)
select v->>'text_sample' your, (v->'text_sample')::text better
from j;
            your             |          better
-----------------------------+--------------------------
 this is a       simple test | "this is a\tsimple test"
(1 row)

and to avoid 0x09 error, try using

replace(values,chr(9),'\t')

as in your example you replace backslash+t, not the actual chr(9)...

4 Comments

I get the same error when I try to do it on the whole table with j as ( select values::json v from temp_json ) select v->>'text_sample' your, (v->'text_sample')::text better from j;
try regexp_replace(values, '\t', '\\t', 'g') ?.. your sample did not provoke the error, so I'm not sure what the problem - you probably have chr(9) as is in values, not the \t?..
I don' t want to escape the tab explicitly as it is possible to have any other kind of special chars, I'm looking instead for a generic way to do that
I think you will have to list them then. I dont know the general mask for "symbolless" characters , like 2,3,4,9,10,13 of ASCII

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.