6

Im currently trying to optimise data loading in Postgres via JDBC. We are using COPY FROM STDIN with FORMAT 'binary' Now building the binary byte arrays is pretty straight forward for strings, longs, uuid's etc. However in one instance we have a JSONB field in the table and I have no idea how to serialize my json objects into the binary jsonb format. Is there any specification anywhere for jsonb?

Note: I have ruled out just sending a utf-8 binary serialized json string.

3 Answers 3

6

You just need to treat the json object as a normal string but adding a 1 (byte) before for the version number, which is the only version they currently support. Make sure as well you specify the length of the field is the "string.length" + 1 (for the version number)

So, basically, if j is your json and dos is the output stream:

val utfBytes = j.toString.getBytes("UTF8")
dos.writeInt(utfBytes.length + 1)
dos.write(Array(1.toByte))
dos.write(utfBytes)

This is a comment from postgres source code (mirrored to github):

 /*
  104  * jsonb type recv function
  105  *
  106  * The type is sent as text in binary mode, so this is almost the same
  107  * as the input function, but it's prefixed with a version number so we
  108  * can change the binary format sent in future if necessary. For now,
  109  * only version 1 is supported.
  110  */
Sign up to request clarification or add additional context in comments.

3 Comments

Is this still valid?
@DonnyV. Well if they have versioned it, Postgres should always continue to accept this format. Even if also accepting newer versions - it wouldn't make sense to cause a backwards incompatibility for no reason.
@DonnyV. I've added links, and nothing has changed yet
1

C# version with NPGSQL

Npgsql requires to use FORMAT BINARY and to write jsonb field you have to add number 1 in byte format infront of the bytes of jsonb data..

Simple version, where writer is the NpgsqlBinaryImporter object

byte versionByte = (byte)1;
var userDataBytes = Encoding.UTF8.GetBytes(userData.UserData);
byte[] finalArray = new byte[userDataBytes.Length + 1];
System.Buffer.BlockCopy(new byte[] { versionByte }, 0, finalArray, 0, 1);
System.Buffer.BlockCopy(userDataBytes, 0, finalArray, 1, userDataBytes.Length);
writer.Write(finalArray);

or you can use this extension method

/// <summary>
/// Converts string to JSONB bytes array for COPY command
/// </summary>
/// <param name="json"></param>
/// <param name="encoding">If null UTF8 is used</param>
/// <returns>Byte array with first byte being value "1", which is only supported version by PostgreSQL, the rest of bytes is the original <paramref name="json"/></returns>
public static byte[] ToJsonBBytes(this string json, Encoding encoding = null)
{
    encoding = encoding ?? Encoding.UTF8;

    byte versionByte = (byte)1;
    var userDataBytes = encoding.GetBytes(json);
    byte[] finalArray = new byte[userDataBytes.Length + 1];
    System.Buffer.BlockCopy(new byte[] { versionByte }, 0, finalArray, 0, 1);
    System.Buffer.BlockCopy(userDataBytes, 0, finalArray, 1, userDataBytes.Length);

    return finalArray;
}

3 Comments

Is this still valid?
Yes, We are using it in a live project. Do you have any issues with that?
I'm just curious on implementation. Everything I find says that Postgis orders the properties in alphabetical order and removes duplicate property names. When casting json string to jsonb. Your code isn't doing any of that. Are you using the newest version of Postgres?
0

Refer - https://www.npgsql.org/doc/api/Npgsql.NpgsqlBinaryImporter.html#Npgsql_NpgsqlBinaryImporter_Write__1___0_NpgsqlTypes_NpgsqlDbType_

C#

writer.Write(“{}”, NpgsqlDbType.Jsonb); 

Add string to writer along with the Npgsql data type.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.