1

For example, in the following struct:

1) editLine is a pointer to a data line which has CLRF,
2) nDisplayLine is the display line index of this editLine,
3) start is the offset in the display line,
4) len is the length of the text;

struct CacheKey {
    const CEditLine* editLine;
    int32 nDisplayLine;
    int32 start;
    int32 len;
    friend bool operator==(const CacheKey& item1, const CacheKey& item2) {
        return (item1.start == item2.start && item1.len == item2.len && item1.nDisplayLine == item2.nDisplayLine &&
            item1.editLine == item2.editLine);
    }
    CacheKey() {
        editLine = NULL;
        nDisplayLine = 0;
        start = 0;
        len = 0;
    }
    CacheKey(const CEditLine* editLine, int32 dispLine, int32 start, int32 len) :
        editLine(editLine), nDisplayLine(dispLine), start(start), len(len)
    {
    }

    int hash() {
        return (int)((unsigned char*)editLine - 0x10000) + nDisplayLine * nDisplayLine + start * 2 - len * 1000;  
    }
};

Now I need to put it into a std::unordered_map<int, CacheItem> cacheMap_

The problem is how to design the hash function for this structure, is there any guidelines?

How could i make sure the hash function is collision-free?

1
  • “How could i make sure the hash function is collision-free?” – You don’t. A hash function isn’t supposed to be collision-free. Commented Jul 1, 2013 at 11:46

1 Answer 1

2

To create a hash function, you can use std::hash, which is defined for integers. Then, you can combine them "as the boost guys does" (because doing a good hash is something non trivial) as explained here : http://en.cppreference.com/w/cpp/utility/hash.

Here is a hash_combine method :

inline void hash_combine(std::size_t& seed, std::size_t v)
{
    seed ^= v + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}

So the "guideline" is more or less what's is shown on cppreference.

You CAN'T be sure your hash function is colision free. Colision free means that you do not loose data (or you restrict yourself to a small set of possibilities for your class). If any int32 value is allowed for each fields, a collision free hash is a monstrously big index, and it won't fit in a small table. Let unordered_map take care of collisions, and combine std::hash hash as explained above.

In you case, it will look something like

std::size_t hash() const
{
    std::size_t h1 = std::hash<CEditLine*>()(editLine);
    //Your int32 type is probably a typedef of a hashable type. Otherwise,
    // you'll have to static_cast<> it to a type supported by std::hash.
    std::size_t h2 = std::hash<int32>()(nDisplayLine);
    std::size_t h3 = std::hash<int32>()(start);
    std::size_t h4 = std::hash<int32>()(len);
    std::size_t hash = 0;
    hash_combine(hash, h1);
    hash_combine(hash, h2);
    hash_combine(hash, h3);
    hash_combine(hash, h4);
    return hash;
}

Then, you can specialize the std::hash operator for your class.

namespace std
{
    template<>
    struct hash<CacheKey>
    {
    public:
        std::size_t operator()(CacheKey const& s) const 
        {
            return s.hash();
         }
    };
}
Sign up to request clarification or add additional context in comments.

1 Comment

This is a good guideline but please add a short summary to your answer to prevent link rot.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.