0

I want to implement a hashing technique in C where all the permutation of a string have same hash keys.
e.g. abc & cab both should have same keys.

I have thought of adding the ascii values & then checking frequency of characters[important otherwise both abc & aad would have same keys which we do not want].
But, it doesn't seem to be much efficient.

Is there any better hashing function which resolves collisions well & also doesn't result into sparse hash table?

Which hashing technique is used internally by Java [for strings] which not only minimizes the collisions but also the operations[insertion ,deletion, search] are fast enough?

1
  • Here's a similar question that might enlighten you... Commented Jun 24, 2012 at 15:34

4 Answers 4

12

Why not sort the string's characters before hashing?

Sign up to request clarification or add additional context in comments.

12 Comments

What if the string is several megabytes long?
Any processing is going to be at least O(n) time for this. Sorting should be possible in O(nlogn) time and O(n) space. Depending on the performance requirements, these increases may be too much, but they don't seem crazy to me.
@Tony, David: Actually, a string can be sorted in O(n) time (counting sort).
@OliCharlesworth well, watch me delete my answer after this! Thanks!
@OliCharlesworth, Since it's ASCII, you're right. But instead of doing the full counting sort, just count and hash the counters.
|
4

The obvious technique is to simply sort the string. You could simply use the sorted string as the lookup key, or you can hash it with any algorithm deemed appropriate. Or you could use a run-length encoded (RLE) representation of your string (so the RLE of banana would be a3bn2), and optionally hash that.

A lot depends on what you're going to do with the hashes, and how resistant they must be to collisions. A simple CRC (cylic redundancy checksum) might be adequate, or it might be that cryptographic checksums such as MD5 or SHA1 are not secure enough for you.

1 Comment

+1 for stating that the use of the hash is important, and that this can change the solution.
2

Which hashing technique is used internally by Java [for strings] which not only minimizes the collisions but also the operations[insertion ,deletion, search] are fast enough?

The basic "trick" used in Java for speed is caching of the hash value making it a member variable of a String and so you only compute it once. BUT this can only work in Java since strings are immutable.

Comments

1

The main rule about hashing is "Don't invent your own hashing algorithm. Ever.". You could just sort characters in string and apply standard hashing strategy.

Also read that if you are interested in hashing.

1 Comment

For a security stand point that is absolutely true, but you won't find any secure hash algorithm witch produce the same hash for permutations. And that's what algogeek is looking for!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.