13

I have a column that is a datetime, converted_at.

I plan on making calls that check WHERE converted_at is not null very often. As such, I'm considering having a boolean field converted. Is their a significant performance difference between checking if a field is not null vs if it is false?

Thanks.

3
  • 1
    I don't think there's a significant performance difference, but if there is, then it's probably in favor of is not null, since when testing boolean it will still want to find out whether it's not null. Commented Jan 25, 2012 at 20:50
  • @MichaelKrelin-hacker: I would guess that the boolean field would be defined as NOT NULL. Commented Jan 25, 2012 at 21:13
  • @ypercube, I don't think it will be used as anything but integrity constraint. Commented Jan 25, 2012 at 21:34

3 Answers 3

7

If things are answerable in a single field you favour that over to splitting the same thing into two fields. This creates more infrastructure, which, in your case is avoidable.

As to the nub of the question, I believe most database implementation, MySQL included, will have an internal flag which is boolean anyways for representing the NULLability of a field.

You should rely that this is done for you correctly.

As to performance, the bigger question should be on profiling the typical queries that you run on your database and where you created appropriate indexes and analyze table on to improve execution plans and which indexes are used during queries. This question will have a far bigger impact to performance.

Sign up to request clarification or add additional context in comments.

1 Comment

I agree with the first part of your answer. But it does not answer to the question. Then the rest of your answer is as useless as you would say "It depends".
4

Using WHERE converted_at is not null or WHERE converted = FALSE will probably be the same in matters of query performance.

But if you have this additional bit field, that is used to store whether the converted_at field is Null or not, you'll have to somehow maintain integrity (via triggers?) whenever a new row is added and every time the column is updated. So, this is a de-normalization. And also means more complicated code. Moreover, you'll have at least one more index on the table (which means a bit slower Insert/Update/Delete operations).

Therefore, I don't think it's good to add this bit field.

If you can change the column in question from NULL to NOT NULL (possibly by normalizing the table), you may get some performance gain (at the cost/gain of having more tables).

1 Comment

MySQL scans the whole table for negative conditions e.g WHERE converted_at is not null so I don't think your statement that both choices "will probably be the same in matters of query performance" is correct
2

I had the same question for my own usage. So I decided to put it to the test. So I created all the fields required for the 3 possibilities I imagined:

# option 1
ALTER TABLE mytable ADD deleted_at DATETIME NULL;
ALTER TABLE mytable ADD archived_at DATETIME NULL;

# option 2
ALTER TABLE mytable ADD deleted boolean NOT NULL DEFAULT 0;
ALTER TABLE mytable ADD archived boolean NOT NULL DEFAULT 0;

# option 3
ALTER TABLE mytable ADD invisibility TINYINT(1) UNSIGNED NOT NULL DEFAULT 0
            COMMENT '4 values possible' ;

The last is a bitfield where 1=archived, 2=deleted, 3=deleted + archived

First difference, you have to create indexes for optioon 2 and 3.

CREATE INDEX mytable_deleted_IDX USING BTREE ON mytable (deleted) ;
CREATE INDEX mytable_archived_IDX USING BTREE ON mytable (archived) ;

CREATE INDEX mytable_invisibility_IDX USING BTREE ON mytable (invisibility) ;

Then I tried all of the options using a real life SQL request, on 13k records on the main table, here is how it looks

SELECT *
FROM mytable
LEFT JOIN  table1 ON mytable.id_qcm = table1.id_qcm
LEFT JOIN  table2 ON table2.id_class = mytable.id_class
INNER JOIN  user ON mytable.id_user = user.id_user
where mytable.id_user=1  
and mytable.deleted_at is null  and mytable.archived_at is null
# and deleted=0    
# and invisibility=0  
order BY id_mytable

Used alternatively the above commented filter options.
Used mysql 5.7.21-1 debian9

My conclusion:

The "is null" solution (option 1) is a bit faster, or at least same performance.

The 2 others ("deleted=0" and "invisibility=0") seems in average a bit slower.

But the nullable fields option have decisive advantages: No index to create, easier to update, easier to query. And less storage space used.

(additionnaly inserts & updates virtually should be faster as well, since mysql do not need to update indexes, but you never would be able to notice that).

So you should use the nullable datatime fields option.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.