4

I'm attempting to learn Sqlalchemy and utilize an ORM. One of my columns stores file hashes as binary. In SQL, the select would simply be

SELECT type, column FROM table WHERE hash = UNHEX('somehash')

How do I achieve a select like this (ideally with an insert example, too) using my ORM? I've begun reading about column overrides, but I'm confused/not certain that that's really what I'm after.

eg res = session.query.filter(Model.hash == __something__? )

Thoughts?

3 Answers 3

11

Only for select's and insert's

Well, for select you could use:

>>> from sqlalchemy import func
>>> session = (...)
>>> (...)
>>> engine = create_engine('sqlite:///:memory:', echo=True)
>>> q = session.query(Model.id).filter(Model.some == func.HEX('asd'))
>>> print q.statement.compile(bind=engine)
SELECT model.id
FROM model
WHERE model.some = HEX(?)

For insert:

>>> from sqlalchemy import func
>>> session = (...)
>>> (...)
>>> engine = create_engine('sqlite:///:memory:', echo=True)
>>> m = new Model(hash=func.HEX('asd'))
>>> session.add(m)
>>> session.commit()
INSERT INTO model (hash) VALUES (HEX(%s))

A better approach: Custom column that converts data by using sql functions

But, I think the best for you is a custom column on sqlalchemy using any process_bind_param, process_result_value, bind_expression and column_expression see this example.

Check this code below, it create a custom column that I think fit your needs:

from sqlalchemy.types import VARCHAR
from sqlalchemy import func

class HashColumn(VARCHAR):

    def bind_expression(self, bindvalue):
        # convert the bind's type from String to HEX encoded 
        return func.HEX(bindvalue)

    def column_expression(self, col):
        # convert select value from HEX encoded to String
        return func.UNHEX(col)

You could model your a table like:

from sqlalchemy import Column, types
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Model(Base):
    __tablename__ = "model"
    id = Column(types.Integer, primary_key=True)
    col = Column(HashColumn(20))

    def __repr__(self):
        return "Model(col=%r)" % self.col

Some usage:

>>> (...)
>>> session = create_session(...)
>>> (...)
>>> model = Model(col='Iuri Diniz')
>>> session.add(model)
>>> session.commit()

this issues this query:

INSERT INTO model (col) VALUES (HEX(?)); -- ('Iuri Diniz',)

More usage:

>>> session.query(Model).first()
Model(col='Iuri Diniz')

this issues this query:

SELECT 
    model.id AS model_id, UNHEX(model.col) AS model_col 
FROM model 
LIMIT ? ; -- (1,)

A bit more:

>>> session.query(Model).filter(Model.col == "Iuri Diniz").first()
Model(col='Iuri Diniz')

this issues this query:

SELECT 
    model.id AS model_id, UNHEX(model.col) AS model_col 
FROM model 
WHERE model.col = HEX(?) 
LIMIT ? ; -- ('Iuri Diniz', 1)

Extra: Custom column that converts data by using python types

Maybe you want to use some beautiful custom type and want to convert it between python and the database.

In the following example I convert UUID's between python and the database (the code is based on this link):

import uuid
from sqlalchemy.types import TypeDecorator, VARCHAR

class UUID4(TypeDecorator):
    """Portable UUID implementation

    >>> str(UUID4())
    'VARCHAR(36)'
    """

    impl = VARCHAR(36)

    def process_bind_param(self, value, dialect):
        if value is None:
            return value
        else:
            if not isinstance(value, uuid.UUID):
                return str(uuid.UUID(value))
            else:
                # hexstring
                return str(value)

    def process_result_value(self, value, dialect):
        if value is None:
            return value
        else:
            return uuid.UUID(value)
Sign up to request clarification or add additional context in comments.

6 Comments

You are using varchar instead of binary as a guid field.
@Ricardo, well, it's only an example, just for demonstrate how to convert between python datatypes and db datatypes.
are the func.HEX and func.UNHEX deprecated? Following your code does not work, the same object is returned regardless of the input and the value seems to be hidden within the object. Also, I am not able to find any docs regarding these two functions, whats going on here??
@MatthewTrevor. I understood now. (at)Clocker, according to documentation: "Any name can be given to func. If the function name is unknown to SQLAlchemy, it will be rendered exactly as is."
|
1

I wasn't able to get @iuridiniz's Custom column solution to work because of the following error:

sqlalchemy.exc.StatementError: (builtins.TypeError) encoding without a string argument

For an expression like:

m = Model(col='FFFF')
session.add(m)
session.commit()

I solved it by overriding process_bind_param, which processes the parameter before passing it to bind_expression for interpolation into your query language.

from sqlalchemy.types import VARCHAR
from sqlalchemy import func

class HashColumn(VARCHAR):

    def process_bind_param(self, value, dialect):
        # encode value as a binary
        if value:
            return bytes(value, 'utf-8')

    def bind_expression(self, bindvalue):
        # convert the bind's type from String to HEX encoded
        return func.HEX(bindvalue)

    def column_expression(self, col):
        # convert select value from HEX encoded to String
        return func.UNHEX(col)

And then defining the table is the same:

from sqlalchemy import Column, types
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Model(Base):
    __tablename__ = "model"
    id = Column(types.Integer, primary_key=True)
    col = Column(HashColumn(20))

    def __repr__(self):
        return "Model(col=%r)" % self.col

Comments

1

I really like iuridiniz approach A better approach: Custom column that converts data by using sql functions, but I had some trouble making it work when using BINARY and VARBINARY to store hex strings in MySQL 5.7. I tried different things, but SQLAlchemy kept complaining about the encoding, and/or the use of func.HEX and func.UNHEX in contexts where they couldn't be used. Using python3 and SQLAlchemy 1.2.8, I managed to make it work extending the base class and replacing its processors, so that sqlalchemy does not require a function from the database to bind the data and compute the result, but rather it is done within python, as follows:

import codecs
from sqlalchemy.types import VARBINARY

class VarBinaryHex(VARBINARY):
    """Extend VARBINARY to handle hex strings."""

    impl = VARBINARY

    def bind_processor(self, dialect):
        """Return a processor that decodes hex values."""
        def process(value):
            return codecs.decode(value, 'hex')
        return process

    def result_processor(self, dialect, coltype):
        """Return a processor that encodes hex values."""
        def process(value):
            return codecs.encode(value, 'hex')
        return process

    def adapt(self, impltype):
        """Produce an adapted form of this type, given an impl class."""
        return VarBinaryHex()

The idea is to replace HEX and UNHEX, which require DBMS intervention, with python functions that do just the same, encode and decode an hex string just like HEX and UNHEX do. If you directly connect to the database, you can use HEX and UNHEX, but from SQLAlchemy, codecs.enconde and codecs.decode functions make the work for you.

I bet that, if anybody were interested, writting the appropriate processors, one could even manage the hex values as integers from the python perspective, allowing to store integers that are greater the BIGINT.

Some considerations:

  • BINARY could be used instead of VARBINARY if the length of the hex string is known.
  • Depending on what you are going to do, it might worth to un-/capitalise the string on the constructor of class that is going to use this type of column, so that you work with a consistent capitalization, right at the moment of the object initialization. i.e., 'aa' != 'AA' but 0xaa == 0xAA.
  • As said before, you could consider a processor that converts db binary hex values to prython integer.
  • When using VARBINARY, be careful because 'aa' != '00aa'
  • If you use BINARY, lets say that your column is col = Column(BinaryHex(length=4)), take into account that any value that you provide with less than length bytes will be completed with zeros. I mean, if you do obj.col = 'aabb' and commit it, when you later retrieve this, from the dataase, what you will get is obj.col == 'aabb0000', which is something quite different.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.