1

I have a database with binary these strings

record no 1: 1111111111111011000100110001100100010000000000000011000000000000
record no 2: 1111111111111111111111100001100000010000000000000011000000000000
record no 3: 1110000011110000111010001110111011110000111100001100000011000000
...

So, i want to find out what record had similar bỉnary string with this: 1111111111111011000100110001100100010000000000000011000000001100

You can see, the record number 1 is 98% relevance. record number 2 is 70% relevance, and record number 3 is only 45% percent relevance.

This is huge database (200.000 records)...

3
  • Take a look at this SO question: stackoverflow.com/questions/4777070/… Commented Oct 15, 2013 at 4:13
  • @Bjoern can you help me to complete mysql query? i already read it, but i still don't know how to make a query Commented Oct 15, 2013 at 4:15
  • Well, your select query will look something like SELECT HUMMINGDISTANCE(some_parameter) FROM yourtable;, if you adapt the function provided there. The author converts binary strings to big integers for performance, so you should adapt this while feeding the function with your parameters. He also uses 32 bytes, you have take that into consideration with your binary values. Commented Oct 15, 2013 at 4:23

1 Answer 1

1
SELECT * FROM MY_TABLE ORDER BY BIT_COUNT(CAST(CONV(record,2,10) as unsigned integer) ^ CAST(b'11...0' as unsigned integer)) LIMIT 1;

The above query will return the most similar record.

You can also SELECT the BIT_COUNT, it's min=0 means identity (record=input) or 100%, it's max=64 means that all bits differ (record = ~input) or 0%.

Sign up to request clarification or add additional context in comments.

6 Comments

just tried your query, but i don't see the order by relevance here. If field type is BIGINT, i can't store full string to database. i'm using varchar. Here is my query: SELECT * FROM files ORDER BY BIT_COUNT( hash ^1110000011110000111010001110111011110000111100001100000011000000 ) LIMIT 0 , 30
Well, it works only on integers, you will have to convert records to BIGINT and your input value to a long.
You can try to replace record with BINARY(record) and select your input as b'1100...'. But I'm not sure whether BINARY conversion will work.
okay, try this: ORDER BY BIT_COUNT(CAST(CONV(record,2,10) as unsigned integer) ^ CAST(b'11...0' as unsigned integer)); where "record" is your varchar.
many thanks. it works like a charm now. so, if i have large database (200k records) this query with varchar field type will affect to my performance? or is the faster way to query it?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.