0

I'm loading some JSON from Postgres 13 into Elasticsearch using Logstash and ran into some errors caused by text not being escaped with reverse solidus. I tracked my problem down to this behavior:

SELECT
  json_build_object(
    'literal_text', 'abc\ndef'::text,
    'literal_text_type', pg_typeof('abc\ndef'::text),
    'text_from_jsonb_array_element', a->>0,
    'jsonb_array_element_type', pg_typeof(a->>0)
  )
FROM jsonb_array_elements('["abc\ndef"]') jae (a);
{
 "literal_text": "abc\\ndef",
 "literal_text_type": "text",
  "text_from_jsonb_array_element": "abc\ndef",
  "jsonb_array_element_type":"text"
}

db-fiddle

json_build_object encodes the literal text as expected (turning \n into \\n); however, it doesn't encode the text retrieved via jsonb_array_element even though both are text.

Why is the text extracted from jsonb_array_element being treated differently (not getting escaped by jsonb_build_object)? I've tried casting, using jsonb_array_elements_text (though my actual use case involves an array of arrays, so I need to split to a set of jsonb), and various escaping/encoding/formatting functions, but haven't found a solution yet.

Is there a trick to cast text pulled from jsonb_array_element so it will get properly encoded by jsonb_build_object?

Thanks for any hints or solutions.

1 Answer 1

1

Those strings look awfully similar, but they're actually different. When you create a string literal like '\n', that's a backslash character followed by an "n" character. So when you put that into json_build_object, it needs to add a backslash to escape the backslash you're giving it.

On the other hand, when you call jsonb_array_elements('["abc\ndef"]'), you're saying that the JSON has precisely a \n encoded in it with no second backslash, and therefore when it's converted to text, that \n is interpreted as a newline character, not two separate characters. You can see this easily by running the following:

SELECT a->>0 FROM jsonb_array_elements('["abc\ndef"]') a;
 ?column?
----------
 abc     +
 def
(1 row)

On encoding that back into a JSON, you get a single backslash again, because it's once again encoding a newline character.

If you want to escape it with an extra backslash, I suggest a simple replace:

SELECT 
  json_build_object(
    'text_from_jsonb_with_replace', replace(a->>0, E'\n', '\n')
  ) 
FROM jsonb_array_elements('["abc\ndef"]') jae (a);
               json_build_object
------------------------------------------------
 {"text_from_jsonb_with_replace" : "abc\\ndef"}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.