12

I have large DB (many millions of rows) and I'm trying to make the best choices for datatypes for 2 fields. Most everything I have made varchar or INT. However, 2 fields I'm wondering if Enum is the best way.

Field 1 First field is gender, My data is currently either 'Male' or 'Female' or it could be blank. I initially set it up like this:

GENDER VARCHAR(6) NOT NULL

Is this the best way, or would it be better to set it up as:

GENDER ENUM ('Male', 'Female') NOT NULL

And do I need to make it NOT NULL to allow for the blank, or do I need to add the blank, i.e.

GENDER ENUM ('Male', 'Female', '') NOT NULL

Not to mention, I'm considering converting the entire field to just M or F.

Field 2: I have pretty much the same things to consider, except for the state field, which could include 52 values (50 states, DC, plus blank).

I guess the biggest question is - Is all this Enum stuff worth it? My DB has many millions of rows, so everything is a factor, but should I just be using VARCHAR(2) for the states instead of ENUM.

2 Answers 2

11

The rule of thumb I usually apply to such cases is NOT to use MySQL ENUMs. Using them creates maintenance issues, especially around adding/removing/renaming some of the values. In InnoDB, renaming and removing an enum value is heavy on big tables. Adding a value isn't (as long as you don't add it in the middle).

As you probably DO want to keep this column in context, and not to allow any value out of this context, the best way IMHO is to use INT, and connect it as a foreign key to a values table (columns id, value).

You will be able to add and rename values in this table easily, and before you remove a value the FK will enforce handling any existing records in the main table which have this value.

To read the data easily, all you need is a simple JOIN.

Note: since genders are pretty final, you may want to leave it as VARCHAR(1) or use an ENUM like Johan suggests, but who knows? You may want to support transgenders and androgyny in the future. Not kidding.

Sign up to request clarification or add additional context in comments.

12 Comments

normally I'd agree, but for gender 2, maybe 3 absolute maximum 7 values I think it is overkill.
@Johan: 7 is a very small number for my alien space battle game.
Jokes aside, CHAR(1) is a valid approach. It even lets you add a foreign key to a reference table (with 2, 3, 7 rows whatever) and you don't have to mess with the ENUM when you need to add a new gender (or an 'Unknown' value).
One of the things that concerns me is importing CSV files in the future. I don't want to run into issues with blank fields that might interrupt an import.
@ypercube, it's almost bedtime, how am I supposed to get that picture out of my head?
|
7

If you want to have a value for no value entered, use null that's what null is designed for!

If you want to specify something in between male and female, use

ENUM('male','female','other') NULL;

Note that an enum does not store the literal text value in the column.
male is stored as 1, female as 2 and other as 3 etc.
This means that it is much more efficient than varchar.

If you are struggling with null in your selects, note that you can use the ifnull or coalesce functions to replace the null with something more useful.

SELECT IFNULL(gender,'other') as gender FROM people;
-- or the identical statement
SELECT COALESCE(gender,'other') as gender FROM people;

1 Comment

+1 for the proper use of NULL and the relevant functions examples.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.