How construct hash function for a user defined type?

Question

For example, in the following struct:

1) editLine is a pointer to a data line which has CLRF,
2) nDisplayLine is the display line index of this editLine,
3) start is the offset in the display line,
4) len is the length of the text;

struct CacheKey {
    const CEditLine* editLine;
    int32 nDisplayLine;
    int32 start;
    int32 len;
    friend bool operator==(const CacheKey& item1, const CacheKey& item2) {
        return (item1.start == item2.start && item1.len == item2.len && item1.nDisplayLine == item2.nDisplayLine &&
            item1.editLine == item2.editLine);
    }
    CacheKey() {
        editLine = NULL;
        nDisplayLine = 0;
        start = 0;
        len = 0;
    }
    CacheKey(const CEditLine* editLine, int32 dispLine, int32 start, int32 len) :
        editLine(editLine), nDisplayLine(dispLine), start(start), len(len)
    {
    }

    int hash() {
        return (int)((unsigned char*)editLine - 0x10000) + nDisplayLine * nDisplayLine + start * 2 - len * 1000;  
    }
};

Now I need to put it into a std::unordered_map<int, CacheItem> cacheMap_

The problem is how to design the hash function for this structure, is there any guidelines?

How could i make sure the hash function is collision-free?

“How could i make sure the hash function is collision-free?” – You don’t. A hash function isn’t supposed to be collision-free. — Konrad Rudolph
– Konrad Rudolph, Commented Jul 1, 2013 at 11:46

Jeremy Cochoy · Accepted Answer · 2013-07-01 12:02:38Z

To create a hash function, you can use std::hash, which is defined for integers. Then, you can combine them "as the boost guys does" (because doing a good hash is something non trivial) as explained here : http://en.cppreference.com/w/cpp/utility/hash.

Here is a hash_combine method :

inline void hash_combine(std::size_t& seed, std::size_t v)
{
    seed ^= v + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}

So the "guideline" is more or less what's is shown on cppreference.

You CAN'T be sure your hash function is colision free. Colision free means that you do not loose data (or you restrict yourself to a small set of possibilities for your class). If any int32 value is allowed for each fields, a collision free hash is a monstrously big index, and it won't fit in a small table. Let unordered_map take care of collisions, and combine std::hash hash as explained above.

In you case, it will look something like

std::size_t hash() const
{
    std::size_t h1 = std::hash<CEditLine*>()(editLine);
    //Your int32 type is probably a typedef of a hashable type. Otherwise,
    // you'll have to static_cast<> it to a type supported by std::hash.
    std::size_t h2 = std::hash<int32>()(nDisplayLine);
    std::size_t h3 = std::hash<int32>()(start);
    std::size_t h4 = std::hash<int32>()(len);
    std::size_t hash = 0;
    hash_combine(hash, h1);
    hash_combine(hash, h2);
    hash_combine(hash, h3);
    hash_combine(hash, h4);
    return hash;
}

Then, you can specialize the std::hash operator for your class.

namespace std
{
    template<>
    struct hash<CacheKey>
    {
    public:
        std::size_t operator()(CacheKey const& s) const 
        {
            return s.hash();
         }
    };
}

This is a good guideline but please add a short summary to your answer to prevent link rot.

Collectives™ on Stack Overflow

How construct hash function for a user defined type?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related