0

I am new to C++, I would appreciate if anyone and help me to validate the below function or help me improve it.

void RecursiveKnn(std::set<PointId> &refSet, const PointId &id, const KD3Index &kid, int const lvl)
{
    if (refSet.find(id) == refSet.end())
        refSet.insert(id);

    if (lvl < m_knn)
    {
        auto ids = kid.neighbors(id, 7);
        for (auto x : ids)
        {
            if (x == id)
                continue;
            RecursiveKnn(refSet, x, kid, lvl+1);
        }
    }
}

I have written a recursive function to run and generate a set of hierarchical objects. basically, start with one object, get next/nearby objects and so on for the next level. Along with it, I want to avoid duplicates as well. The levels are limited to 3 - 4 and do not expect to go any further.

This function is called millions of time and is taking forever to run. I would really appreciate if anyone can suggest any improvement. On top of my head I am sure std::set is not the correct data structure to use but, I don't know what to use.

EDIT: The reason, I find the function to have the performance problem is that. At first I have a single function with 3 nested for loop. which worked within reasonable time. When I changed it to a recursive function, The process did not complete for more than an hour.

13
  • 6
    What did you find when you profiled your application? Why is std::set not the correct structure? We have no idea what are your purposes and requirements. Commented Aug 7, 2019 at 15:10
  • At a guess, your performance issue comes not from the recursion as such but from the call to kid.neighbors(). Commented Aug 7, 2019 at 15:10
  • Well, based on the above std::unordered_set would be better because std::set's orderedness doesnt seem to matter. Commented Aug 7, 2019 at 15:11
  • 1
    What optimization level did you build with, and how did you profile? Commented Aug 7, 2019 at 15:13
  • 1
    Can item A be neighbors to item B and item C also neighbors to item B? I think your data structure doesnt have cycles from children to parents i.e. it a DAG but that doesnt mean it is impossible to not explore the same node multiple times. Your test (x == id) just tests that you do not re=explore a parent but what about re-exploring a child you already explored when exploring another parent? Commented Aug 7, 2019 at 15:29

2 Answers 2

4

Without more information my guess would be that multiple PointIds can be neighbors with the same PointIds, even though there is a parent/child relation that holds.

In other words, this code is performing a depth-first search on a directed acyclic graph but is not checking for revisits the way a typical depth-first search does. This means you will be exploring the same nodes many times which is extremely inefficient.

Change

if (refSet.find(id) == refSet.end())
    refSet.insert(id);

to

if (refSet.find(id) != refSet.end())
    return;
refSet.insert(id);

Also you should use an std::unordered_set instead of an std::set but that is a lesser concern if the above is true.

Sign up to request clarification or add additional context in comments.

4 Comments

Your concern is very well possible. Also, is if(refSet.insert(id).second){ ..recursive. } is same as you suggestion and is it better or worse for perfomance.
I dont think that is the same because I think insert will return the inserted item whether it is there or not ... you would want .insert to return something that evaluates to false in the case where it is already there ... if i understamd your question. But regardless if your code is really exploring the same nodes multiple times that behavior would dwarf individual mico-optimizations anyway as if it is rexploring the the same nodes multiple times its formal time complexity is worse than it needs to be.
You may be able to take advantage of std::set::insert's return value to simplify this further.
@jwezorek insert returns iterator to the item (inserted or not) and flag if insertion did happen. So this way is more efficient - not doing 2 lookups in the worst case.
1

This is wasteful.

if (refSet.find(id) == refSet.end())
    refSet.insert(id);

There's no need to check if something is a member of a set before inserting it, just insert it anyway.

refSet.insert(id);

That said this is not an order of magnitude improvement.

This might also help

const auto& ids = kid.neighbors(id, 7);

Depends on what neighbors returns but it looks like you're copying some collection or other.

4 Comments

Second one hardly will help if neighbors returns by value, and it is very likely so
@Slava Yes well the OP can at least look at that area. They might need to adjust their API.
Not sure how you can adjust here, looks like that method calculates result so unlikely it can return const reference, anyway due to move semantics I doubt that will help significantly or at all.
So if I just insert it will not create duplicates?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.