13

One of our software-projects uses a PostgreSQL-table with a column 'guid' of type bytea.

This is used with hibernate 3.3.2.GA with PostgreSQL 8.4, which serializes the java UUID type using java object serialization. The result is a value like the following escape format bytea literal:

'\254\355\000\005sr\000\016java.util.UUID\274\231\003\367\230m\205/\002\000\002‌​J\000\014leastSigBitsJ\000\013mostSigBitsxp\273\222)\360*r\322\262u\274\310\020\3‌​42\004M '

... which we cannot easily use in a query as select or condition to retrieve relevant rows.

Does anyone have a way to read or use the bytea-column in the select- or where-parts of a query (e.g. via psql or pgadmin3), without setting up some hibernate-query?

0

3 Answers 3

11

Update: See edit to question, this answer applies to the commonplace 16-byte serializations of uuid; the question was amended to reflect java serialization.


Interesting problem. I landed up writing a simple C extension to do it efficiently, but it's probably more sensible to use the PL/Python version below.

Because uuid is a fixed sized type and bytea is varlena you can't just create cast ... as implicit to binary-coerce them, because the variable length field header would get in the way.

There's no built-in function for bytea input to return a uuid. It'd be a handy thing to have, but I don't think anyone's done it yet.

Simplest way

Update: There's actually a simple way to do this. bytea in hex form is actually a valid uuid literal once the \x is stripped off, because uuid_in accepts plain undecorated hex without - or {}. So just:

regress=> SET bytea_output = 'hex';
SET
regress=> SELECT CAST( substring(CAST (BYTEA '\x0FCC6350118D11E4A5597DE5338EB025' AS text) from 3) AS uuid);
              substring               
--------------------------------------
 0fcc6350-118d-11e4-a559-7de5338eb025
(1 row)

It involves a couple of string copies and a hex encode/decode cycle, but it'll be tons faster than any of the PL answers I suggested earlier, though slower than C.

Other options

Personally I recommend using PL/Perl or pl/pythonu. I'll follow up with an example.

Assuming your uuid is the hex-format bytea literal:

'\x0FCC6350118D11E4A5597DE5338EB025'

you could turn it into a uuid type with:

PL/Perl

create language plperlu;

create or replace function to_uuid(bytea) returns uuid language plperlu immutable as $$
use Data::UUID;
my $ug = new Data::UUID;
my $uuid = $ug->from_hexstring(substr($_[0],2));
return $ug->to_string($uuid);
$$
SET bytea_output = hex;

SELECT to_uuid(BYTEA '\x0FCC6350118D11E4A5597DE5338EB025');

PL/Python

It's probably faster and cleaner in Python because the PL/Python interface passes bytea as raw bytes not as hex strings:

CREATE LANGUAGE plpythonu;

CREATE or replace function to_uuid(uuidbytes bytea) 
RETURNS uuid LANGUAGE plpythonu IMMUTABLE 
AS $$
import uuid
return uuid.UUID(bytes=uuidbytes)
$$;

SELECT to_uuid(BYTEA '\x0FCC6350118D11E4A5597DE5338EB025');

In C, just for kicks. Ugly hack.

You can see the C extension module here.

But really, I mean it about it being ugly. If you want it done properly in C, it's best to actually patch PostgreSQL rather than use an extension.

Sign up to request clarification or add additional context in comments.

2 Comments

Perhaps I should have mentioned more clearly that the contents of the bytea column is the java-serialized version of an UUID-instance, e.g.: select guid from documents limit 1; "\254\355\000\005sr\000\016java.util.UUID\274\231\003\367\230m\205/\002\000\002J\000\014leastSigBitsJ\000\013mostSigBitsxp\273\222)\360*r\322\262u\274\310\020\342\004M "
@FvHovell Er, yes. That would've been useful to say the least. This is why you should always include sample data.
2

After some trial and error I have created the following function to extract the postgresql-UUID value:

CREATE OR REPLACE FUNCTION bytea2uuid (x bytea) RETURNS uuid as $$ SELECT encode(substring(x, 73, 8) || substring(x, 65, 8), 'hex')::uuid $$ language sql;

This works by extracting the bytes used in the java long-values for leastSigBits and mostSigBits (which are stored in reversed order), than encoding to hex and casting to type 'uuid'.

Used as follows: select bytea2uuid(guid) as guid from documents limit 1;

"75bcc810-e204-4d20-bb92-29f02a72d2b2"

2 Comments

You'd better verify the serialVersionUid field too, make sure you're not decoding something wrong.
You are correct in the generic case, but for my purposes I know for certain that all UUID-serialization was and is performed using java 6, as that is used for all our projects. Thus in my case I do not need to check the serialVersionUid, as that is guaranteed to be the same for all guid-values.
0

This works for me:

ALTER TABLE myTable ALTER COLUMN id TYPE uuid USING CAST(ENCODE(id, 'hex') AS uuid);

1 Comment

That is not the correct encoding asked about: an java.util.UUID serialized value such as: "\254\355\000\005sr\000\016java.util.UUID\274\231\003\367\230m\205/\002\000\002‌​J\000\014leastSigBitsJ\000\013mostSigBitsxp\273\222)\360*r\322\262u\274\310\020\3‌​42\004M "

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.