0

in MySQL there is a function called UNHEX which takes a string like '1DB8948899F511E6A18374D02B45FC30' and turns it into a sequence of bits, a binary field. It is what I use for storing UUIDs. The reverse operation is implemented in the function HEX.

I store protein sequences, each protein sequence is a finite sequence of letters, there are at most 21 different letters. Instead of storing each sequence as a clear text, I would like to store them as binary fields.

Thus I would like to implement a custom function, similar to UNHEX, which replaces each letter by a given sequence of 5 bits (this quantity of bits is enough). I will also implement the reciprocal.

How to implement such a function?

I looked into the function COMPRESS, but it provides longer output when run on sequences of length around 63, 64, and the compression factor for sequences of length below 150 is less than the 1.6 factor of compression that I would achieve with my custom function. The sequences with length below 150 are numerous, thus, I will not gain much by using the function COMPRESS.

My MySQL version is 14.14 Distrib 5.5.52, for debian-linux-gnu (x86_64), and you can think a protein sequence as a finite sequence of letters from A to U (the actual letters are not relevant here, I will adapt the code).

What I would like to make is a function which take a string made of these letters from A to U as argument and turn them into a sequence of bits. since 2^4 < 21 <= 2^5, 5 bits for each letter is needed and enough.

I am looking into making a plugin for MySQL which will define both these functions. Am I going too far? Is there an easier way? If so, will the functions gain efficiency by being programmed into a plugin?

4
  • 1
    Note that you have described HEX() and UNHEX() backwards. HEX() unpacks 8 bits per byte of binary data into 4 bits per byte of hexadecimal, UNHEX() packs 4 bits per byte of hex-encoded data into 8 bits per byte of binary data. Commented Oct 25, 2016 at 22:55
  • 1
    For your use case, what are the symbols in the 21-symbol alphabet? Is it the letters A through U, or something else? Also? What version of MySQL? Commented Oct 25, 2016 at 22:57
  • 1
    I would suggest having a look at how the base64_encode / decode functions are implemented as they use 6 bits. Also, how to convert convert to number base 32? Commented Oct 26, 2016 at 13:37
  • 1
    Thanks Ryan. I have found this link: github.com/y-ken/mysql-udf-base64/blob/master/base64.c and I will look into that. I think it answers my question. Commented Oct 26, 2016 at 13:50

1 Answer 1

1

I need to adapt the function base64encode and base64decode. The source for these functions is found here:

https://github.com/y-ken/mysql-udf-base64/blob/master/base64.c

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.