1

So, imagine a mysql table with a few simple columns, an auto increment, and a hash (varchar, UNIQUE).

Is it possible to give mysql a query that will add a column, and generate a unique hash without multiple queries?

Currently, the only way I can think of to achieve this is with a while, which I worry would become more and more processor intensive the more entries were in the db.

Here's some pseudo-php, obviously untested, but gets the general idea across:

while(!query("INSERT INTO table (hash) VALUES (".generate_hash().");")){
    //found conflict, try again.
}

In the above example, the hash column would be UNIQUE, and so the query would fail. The problem is, say there's 500,000 entries in the db and I'm working off of a base36 hash generator, with 4 characters. The likelyhood of a conflict would be almost 1 in 3, and I definitely can't be running 160,000 queries. In fact, any more than 5 I would consider unacceptable.

So, can I do this with pure SQL? I would need to generate a base62, 6 char string (like: "j8Du7X", chars a-z, A-Z, and 0-9), and either update the last_insert_id with it, or even better, generate it during the insert.

I can handle basic CRUD with MySQL, but even JOINs are a little outside of my MySQL comfort zone, so excuse my ignorance if this is cake.

Any ideas? I'd prefer to use either pure MySQL or PHP & MySQL, but hell, if another language can get this done cleanly, I'd build a script and AJAX it too.

Thanks!

2
  • 3
    Why do you want a hash? Can't you use the auto_increment column for the same purposes? Commented Mar 18, 2010 at 22:16
  • I need base62, 6 char hash, to use in a URL string. I COULD translate the auto_inc to base62, but that wouldn't allow me to manually create an entry (say TheLnk - this is a valid base62 hash, but I woudn't be able to add it if I were working on an auto inc...). Commented Mar 19, 2010 at 1:15

8 Answers 8

4

This is our approach for a similar project, where we wanted to generate unique coupon codes.

First, we used an AUTO_INCREMENT primary key. This ensures uniqueness and query speed. Then, we created a base24 numbering system, using A,B,C, etc, without using O and I, because someone might have thought that they were 0 or 1. Then we converted the auto-increment integer to our base24 number. For example, 0=A, 1=B, 28=BE, 1458965=EKNYF. We used base24, because long numbers in base10 have fewer letters in base24.

Then we created a separate column in our table, coupon_code. This was not indexed.

We took the base24 and added 3 random numbers, or I and O (which were not used in our base24), and inserted them into our number. For example, EKNYF could turn into 1EKON6F or EK2NY3F9. This was our coupon code and we inserted it into our coupon_code column. It's unique and random.

So, when the user uses code EK2NY3F9, all we have to do it remove all non-used characters (2,3 and 9) and we get EKNYF, which we convert to 1458965. We just select the primary key 1458965 and then compare coupon_code column with EK2NY3F9.

I hope this helps.

Sign up to request clarification or add additional context in comments.

Comments

3

If your heart is set on using base-36 4 character hashes (hashspace is only 1679616), you could probably pre-generate a table of hashes that aren't already in the other table. Then finding a unique hash would be as simple as moving it from the "unused table" to the "used table" which is O(1).

If your table is conceivably 1/3 full you might want to consider expanding your hashspace since it will probably fill up in your lifetime. Once the space is full you will no longer be able to find unique hashes no matter what algorithm you use.

2 Comments

Because of the nature of the app, I need it to be specifically base62 6 chars. I figure I will pre-generate the table if need be, but even though mysql is fast, running through 50 billion entries is still not the solution I was looking for :( I don't anticipate reaching that many entries, but I'd rather be safe, as a conflict would be catastrophic.
You could pre-generate out ~1 million hashes and pull from those like I have described. Then/if those get low, you can generate more. That way since the generation of the hashes are done before hand, you can ensure O(1). But to be honest, you'll probably never have a conflict. With 1 million hashes you still only have a 0.0017% of a collision, and write some code to deal with the db error of a collision (if it ever happens).
1

What is this hash a hash of? It seems like you just want a randomly generated unique VARCHAR column? What's wrong with the auto increment?

Anyway, you should just use a bigger hash - find an MD5 function - (if you're actually hashing something), or a UUID generator with more than 4 characters, and yes, you could use a while loop, but just generate a big enough one so that conflicts are incredibly unlikely

1 Comment

I guess I should specify, I definitely need this number to be 6 chars, and assuming there won't be a conflict wouldn't work, as I definitely need a guarantee there won't be any conflict. While I'd be comfortable using and MD5 without validation, Even though 36 to the 6th is HUGE, it's not huge enough to blindly insert...
1

As others have suggested whats wrong with an autoinc field? If you want an alpha numeric value then you could simply do a simple conversion from int to a alphanumeric string in base 36. This could be implemented in almost any language.

1 Comment

I'd like to use an autoinc field, and this seems like the best option (I'd have to convert to base62), but as a preference, I'd like the numbers to be random. This also eliminates the ability to add any entries that aren't in order, which is something I anticipate wanting to implement. This is probably the most feasible, option, but still leaves something wanted.
0

Going with zneaks comment, why don't you use an autoincrement column? save the hash in another (non unique) field, and concatenate the id to it (dynamically). So you give a user [hash][id]. You can parse it out in pure sql using the substring functions.

Since you have to have the hash, the user can't look at other records by incrementing the id.

1 Comment

As I responded to Toby, autoinc would definitely be the cleanest way of doing this, the hash isn't there for security as much as it is there to be referenced by. This will be on a URL string, and I would prefer for it to have an option to be human-readable. Unfortunately, locking the db into an auto inc would make that difficult, as I would have to key the hash, and add items out of order. That's definitely the closest so far, and I'll do that if I can't figure out anything else.
0

So, just in case someone runs across a similar issue, I'm using a UNIQUE field, I'll be using a php hash function to insert the hashes, if it comes back with an error, I'll try again. Hopefully because of the low likelyhood of conflict, it won't get slow.

Comments

0

You could also check the MySQL functions UUID() and UUID_SHORT(). Those functions generate UUIDs that are globally unique by definition. You won't have to double-check if your PHP-generated hash string already exists.

I think in several cases these functions can also fit your project's requirements. :-)

Comments

0

If you already have the table filled by some content, you can alter it with the following :

ALTER TABLE `page` ADD COLUMN `hash` char(64) AS (SHA2(`content`, 256)) AFTER `content`

This solution will add hash column right after the content one, generates hash for existing and new records too without need to change your INSERT statement.

If you add UNIQUE index to the column (after have removed duplicates), your inserts will only be done if content is not already in the table. This will prevent duplicates.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.