13

i have to do a select query in a posting table where a specific bit of an integer is set. The integer represents a set of categories in a bitmask: E.g.

1 => health
2 => marketing
3 => personal
4 => music
5 => video
6 => design
7 => fashion
8 => ......

Data example:

id | categories | title
1  | 11         | bla bla
2  | 48         | blabla, too

I need a mysql query that selects postings, that are marked with a specific category. Let's say "all video postings" This means i need a result set of postings where the 5th bit of the catgories column is set (e.g. 16,17,48 ....)

SELECT * FROM postings WHERE ....????

Any ideas ?

4
  • 2
    Why not just an additional table in between: categories_postings? That would be a more future proof solution as this seems just a standard multiple categories database? Commented Feb 2, 2012 at 18:52
  • 1
    I agree with Luc, it'll be way easier to maintain an extra table called, let's say, categories_groups, that will have a structure like: id, category_group_name, health, marketing, personal, music... and which will hold either "0"/"1" under each category to mark if this category belongs to this group. This way it'll also be much easier to sum the number of groups that include "health" category. Commented Feb 2, 2012 at 19:15
  • @Luc - both of you are right - the fact is, the data are publishes by an external application where i cannot make any changes. A many-many relationship would be the best solution .... Commented Feb 3, 2012 at 5:44
  • When it works out for you I would even consider the step of parsing and structuring the data into a better data model if possible before working with it. Commented Feb 3, 2012 at 12:09

3 Answers 3

19

You can use bitwise operators like this. For video (bit 5):

WHERE categories & 16 = 16

Substitute the value 16 using the following values for each bit:

1 = 1
2 = 2
3 = 4
4 = 8
5 = 16
6 = 32
7 = 64
8 = 128

This goes from least significant bit to highest, which is opposite of the way most programmers think. They also start at zero.

Sign up to request clarification or add additional context in comments.

2 Comments

The mappings above can be simplified as: bitVal = 2^(i-1), where i is the index value on the left. So, for example, 16 = 2^(5-1).
is there any difference between WHERE categories & 16 = 16 and WHERE categories & 16 > 0? If I am correct, the bitwise operator should return 0 if the bit is not set.
5

How about

SELECT * FROM postings WHERE (categories & 16) > 0; -- 16 is 5th bit over

One issue with this is you probably won't hit an index, so you could run into perf issues if it's a large amount of data.

Certain databases (such as PostgreSQL) let you define an index on an expression like this. I'm not sure if mySQL has this feature. If this is important, you might want to consider breaking these out into separate Boolean columns or a new table.

Comments

1

SQL (not just mySQL) is not suitable for bitwise operations. If you do a bitwise AND you will force a table scan as SQL will not be able to use any index and will have to check each row one at a time.

It would be better if you created a separate "Categories" table and a properly indexed many-to-many PostingCategories table to connect the two.

UPDATE

For people insisting that bitmap fields aren't an issue, it helps to check Joe Celko's BIT of a Problem.  At the bottom of the article is a list of serious problems caused by bitmaps.

Regarding the comment that a blanket statement can't be right, note #10 - it breaks 1NF so yes, bitmap fields are bad:

  1. The data is unreadable. ...
  2. Constraints are a b#### to write....
  3. You are limited to two values per field. That is very restrictive; even the ISO sex code cannot fit into such a column...
  4. There is no temporal element to the bit mask (or to single bit flags). For example, a flag “is_legal_adult_flg” ... A DATE for the birth date (just 3 bytes) would hold complete fact and let us compute what we need to know; it would always be correct, too. ...
  5. You will find out that using the flags will tend to split the status of an entity over multiple tables....
  6. Bit flags invite redundancy. In the system I just mentioned, we had “is_active_flg” and “is_completed_flg” in in the same table. A completed auction is not active and vice verse. It is the same fact in two flags. Human psychology (and the English language) prefers to hear an affirmative wording (remember the old song “Yes, we have no bananas today!” ?). All of these bit flags, and sequence validation are being replaced by two sets of state transition tables, one for bids and one for shipments. For details on state transition constraints. The history of each auction is now in one place and has to follow business rules.
  7. By the time you disassemble a bit mask column, and throw out the fields you did not need performance is not going to be improved over simpler data types. 
  8. Grouping and ordering on the individual fields is a real pain. Try it.
  9. You have to index the whole column, so unless you luck up and have them in the right order, you are stuck with table scans.
  10. Since a bit mask is not in First Normal Form (1NF), you have all the anomalies we wanted to avoid in RDBMS.

I'd also add, what about NULLs? What about missing flags? What if something is neither true or false?

Finally, regarding the compression claim, most databases pack bit fields into bytes and ints internally. The bitmap field doesn't offer any kind of compression in this case. Other databases (eg PostgreSQL) actually have a Boolean type that can be true/false/unknown. It may take 1 byte but that's not a lot of storage and transparent compression is available if a table gets too large.

In fact, if a table gets large the bitmap fields problems become a lot more serious. Saving a few MBs in a GB table is no gain if you are forced to use table scans, or if you lose the ability to group

9 Comments

This is too much of a blanket statement to be accurate. It's true you won't be able to do index scans in bitwise fields, however, they can often come in handy and lead to massive reduction in storage size, or query speedup depending on what you are going for.
If your search is only using the bitmapped field for the "last mile" -- in other words you've narrowed a much larger search down to a few hundred records or so using other fields/indexes -- then there isn't much of a perf issue to be concerned with. For strong static datasets bitmapped fields are, as others have mentioned, a nice form of data compression. (We aren't going to add a new day to the week nor any new hours in a day anytime soon for example...)
@Techmag actually, no. DBAs typically hate bitmap fields because they cause a LOT of problems, performance one of them. They make the data unreadable, prevent the use of constraints and more. Check Joe Celko's BIT of a Problem. What you call compression, especially for days, would be replaced with a simple type or enum value. In fact, you could argue about compression only if you could use all bits on the backing field. Otherwise you'd be wasting, eg 7 bits out of a 16-bit bield
@delrox actually it isn't a blanket statement. Check Celko's article but the very fact that 1NF is broken while the data is unreadable is enough to make them a bad idea. There's no reduction in storage either, unless your database can't handle single-bit fields. Most databases actually pack multiple bits into bytes, ints and longs internally.
@delrox as you'll read in Celko's article and my edits, bitmap fields provide neither compression nor performance benefits. Especially in larger tables.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.