0

You are given an unweighted tree with N nodes and each node starts with value 0. There will be Q queries. The 1st type of query will change the value at a node to a new value. The 2nd type of query asks for the sum of values of nodes on the simple path between two given nodes.

The value of any node is always between 0 and 10^9 and N, Q are between 1 and 10^5. Offline query answering is allowed.

I tried with storing all node values in array and running dfs to find path sum, but it exceeds 1s time limit.

#include <vector>
#include <cstring>
#include <iostream>
using namespace std;
const int MAX_INPUT = 1e5 + 5;
vector<int> graph[MAX_INPUT];
int values[MAX_INPUT];
bool seen[MAX_INPUT];
long long getSum(int from, int to) {
    if (seen[from]) {
        return -1;
    }
    seen[from] = true;
    if (from == to) {
        return values[from];
    }
    for (int n : graph[from]) {
        int nsum = getSum(n, to);
        if (nsum >= 0) {
            return values[from] + nsum;
        }
    }
    return -1;
}
int main() {
    ios::sync_with_stdio(0); cin.tie(0);
    int N;
    cin >> N;
    for (int i = 0; i < N - 1; i++) {
        int a, b;
        cin >> a >> b;
        graph[a].push_back(b);
        graph[b].push_back(a);
    }
    int Q;
    cin >> Q;
    for (int q = 0; q < Q; q++) {
        int type;
        cin >> type;
        if (type == 1) {
            int node, newvalue;
            cin >> node >> newvalue;
            values[node] = newvalue;
        } else {
            int node1, node2;
            cin >> node1 >> node2;
            memset(seen, 0, N + 1);
            long long pathsum = getSum(node1, node2);
            cout << pathsum << '\n';
        }
    }
}

The input format is:

  • first line is N (number of nodes)
  • next N - 1 lines contain two node numbers that gives an edge in the tree (node numbers are integers between 1 and N)
  • next line is Q (number of queries)
  • next Q lines each have 3 integers t, a, b
  • if t is 1, it is type 1 query to set value at node a to b
  • if t is 2, it is type 2 query to ask sum of values of nodes on the simple path between a and b
  • you never get invalid input

An example input is

5
1 2
1 5
2 3
2 4
5
1 2 10
2 3 4
1 1 5
2 2 5
2 4 3

and output

10
15
10

What is a more efficient way to solve these queries?

5
  • Correct again. To save confusion I suggest "set value of node a to be b" Commented Aug 20 at 19:40
  • 3
    @Isabella Is this online somewhere where we can test potential solutions? Also, why "looking for an answer from a reputable source"? If I came up with a solution myself, you wouldn't want it? Or am I a reputable source? Commented Aug 21 at 15:34
  • @KellyBundy oops mb I think i used the default bounty options. I saw the problem some time ago, don't have the link anymore Commented Aug 22 at 14:53
  • 1
    This looks like competitive programing task. Please provide link where this competition happens, so we can see more details (like more examples of test cases) and try own solutions before providing an answer. Commented Aug 28 at 13:19
  • 1
    @MarekR In the comment right above yours, they said "I saw the problem some time ago, don't have the link anymore". Commented Aug 28 at 16:56

4 Answers 4

6
+50

Root the tree arbitrarily and perform a DFS traversal. During the traversal, record the times when we start and finish processing a node (using an integer variable incremented on each recursive call). All the ancestors of a node have a lower start time as they are encountered first in the DFS. Furthermore, the start times of the nodes in any subtree form a consecutive range of integers, from the start time of the subtree root to its finish time.

We can build a binary indexed tree (which enables logarithmic point updates and prefix sum queries) using these start times as positions. The binary indexed tree initially contains all 0s and we will maintain the fact that the prefix sum from position 1 to the start time of any node is equal to the sum of values along the path from the root to that node. When a node's value is updated, only the nodes in its subtree are affected as only the paths from the root to those nodes would pass through the updated node. As such, we can increment the value at the start time of the updated node in the binary indexed tree by the difference between the new value and the previous value, and decrement by the same difference at the position after the finish time of the node. Due to the consecutive start times of subtree nodes, this ensures that no nodes outside the updated subtree are changed, in much the same way as with a difference array.

For any two tree nodes, the path between them can be expressed as the path from one node to the lowest common ancestor (LCA) and the path from the LCA to the other node. If we now add the sum of values from the root to each of the two nodes (by querying the binary indexed tree at the start times of the nodes), the path from the root to the LCA is counted twice (but the ancestors of the LCA are not on the path between the two nodes). To rectify this, we can subtract out that path sum twice and then manually add the value at the LCA back to arrive at the sum of values on the path between the two nodes.

In order to quickly compute the LCA, we can use binary lifting (although there are more efficient ways to do this, this method is simple to implement). During the DFS, we will additionally keep track of all the ancestors of each node with the difference in depth being a power of two. This information can be used to quickly jump to the LCA in O(log N) steps.

The time complexity is O((N + Q)log(N)).

#include <iostream>
#include <vector>
#include <bit>
#include <algorithm>
const unsigned MAX_INPUT = 1e5 + 5;
std::vector<int> graph[MAX_INPUT];
int values[MAX_INPUT], ancestor[MAX_INPUT][std::bit_width(MAX_INPUT)], depth[MAX_INPUT],
    start[MAX_INPUT], finish[MAX_INPUT], N, Q, lg, visitTime;
long long bit[MAX_INPUT];
void dfs(int node, int parent) {
    depth[node] = depth[parent] + 1;
    ancestor[node][0] = parent;
    for (int i = 1; i <= lg; ++i) ancestor[node][i] = ancestor[ancestor[node][i - 1]][i - 1];
    start[node] = ++visitTime;
    for (int next : graph[node])
        if (next != parent)
            dfs(next, node);
    finish[node] = visitTime;
}
int LCA(int node1, int node2) {
    if (depth[node1] < depth[node2]) std::swap(node1, node2);
    for (int i = lg; i >= 0; --i)
        if (depth[ancestor[node1][i]] >= depth[node2])
            node1 = ancestor[node1][i];
    if (node1 == node2) return node1;
    for (int i = lg; i >= 0; --i)
        if (ancestor[node1][i] != ancestor[node2][i])
            node1 = ancestor[node1][i], node2 = ancestor[node2][i];
    return ancestor[node1][0];
}
void update(int i, int delta) {
    for (; i <= N; i += i & -i) bit[i] += delta;
}
long long query(int i) {
    long long res{};
    for (; i; i &= i - 1) res += bit[i];
    return res;
}
int main() {
    std::ios_base::sync_with_stdio(false);
    std::cin.tie(nullptr);
    std::cin >> N;
    lg = std::bit_width(unsigned(N)) - 1;
    for (int i = 1, a, b; i < N; ++i) {
        std::cin >> a >> b;
        graph[a].push_back(b);
        graph[b].push_back(a);
    }
    dfs(1, 1);
    std::cin >> Q;
    for (int i = 0, type; i < Q; ++i) {
        std::cin >> type;
        if (type == 1) {
            int node, newValue;
            std::cin >> node >> newValue;
            update(start[node], newValue - values[node]);
            update(finish[node] + 1, values[node] - newValue);
            values[node] = newValue;
        } else {
            int node1, node2;
            std::cin >> node1 >> node2;
            int lca = LCA(node1, node2);
            std::cout << query(start[node1]) + query(start[node2]) - 2 * query(start[lca]) + values[lca] << '\n';
        }
    }
}
Sign up to request clarification or add additional context in comments.

8 Comments

There are other more general approaches that can be applied as well, but they aren't necessary with only point updates.
This answer looks correct. However I really struggled to understand it (just a "me" problem ?). Suggestion: split the answer into smaller parts. Part 1: how from the arbitrarily rooted tree, you create/update an array of N values, such that getting the sum from any node to the root is equivalent to solving the prefix sum problem in the array. Part 2: use it to get the sum between any 2 nodes (current paragraph 3). Part 3: binary indexed tree for prefix sums in log N for query&update
Thanks for the suggestion! I've updated my answer; let me know if it's now easier to understand.
I think it's much better now, thanks !
@VincentSaulue-Laborde Or you could post another answer with that yourself. Your plan sounds good, similar to what I had considered myself. I think an example image with a tree with node values and start/stop times and the resulting array would help a lot.
this is great, ty!!
|
1

You have to get rid of the search:

  1. Find the center of the tree. This will become the root
  2. Use BFS to record the parent of every vertex and the distance of every vertex from the root.

Now, when you need to walk the path between any two vertices, you only need to follow the parent links -- first from the deeper one until they're the same depth, and then simultaneously until the paths meet at their closest common ancestor.

The worst case time is still O(NQ), but it will be a lot faster and will probably suffice.

5 Comments

ty, parent links do make sense to avoid blind searching. but my original version was also O(NQ) I think?
To me it seems that it will be faster than O(NQ) in practice, unless the tree is close to a long line (one child per node). So, the obvious question is whether that limiting case can be optimized or not? (Or it doesn't occur.)
@MattTimmermans I'm curious why you think yours "will probably suffice". Do you think 10^10 steps of your kind can be done in one second, or do you think the author of this problem didn't create such worst case test cases?
10^10 steps wouldn't fit into a second, but 10^9 probably would, so the one worst case that fits into the constraints would pretty much have to be tested specifically to exclude this solution. Could happen, but probably not. Also, if the puzzle authors wanted to reliably exclude quadratic solutions, the constraints on N and Q would probably be 10^6. Quadratic solutions often work with problems that have 10^5-sized constraints.
From my experience as competitor and organizer, I'm quite certain they test such a worst case. And using 10^6 might be too large for the faster solutions. Needs to be large enough but not too large. Problem setters of this difficulty level probably know what they're doing and get that right.
-1

I have simplified the parameter list for the recursive depth first search method and moved all the variables out of the global namespace.

The code looks like this:

#include <string>
#include <fstream>
#include <sstream>
#include <iostream>
#include <vector>
#include <algorithm>

#include "cRunWatch.h"


const int MAX_INPUT = 1e5 + 5;

class cSolver
{
public:

    cSolver();

    void readInput( const std::string& fname );

private:

    // A vector of vectors, storing the one hop connected nodes from each node
    std::vector<std::vector<int>> myGraph;

    std::vector<long long> myNodeValues;

    std::vector<bool> mySeen;

    int myDestination;

    void getPathSum(int from, int to);

    // recursive depth first search
    long long dfs(int from);
};

cSolver::cSolver()
{
    myGraph.resize(MAX_INPUT); // room for all possible nodes
    myNodeValues.resize(MAX_INPUT,0);
}
void cSolver::readInput(const std::string& fname)
{
    auto ifs = std::ifstream(fname);
    if (!ifs.is_open())
        throw std::runtime_error("Cannot open " + fname);
    std::string line;
    while (std::getline(ifs, line))
    {
        std::stringstream sst(line);
        std::vector<std::string> vtokens;
        std::string a;
        while (getline(sst, a, ' '))
            vtokens.push_back(a);
        switch (vtokens.size())
        {
        case 1:
            // this is a count of nodes or queries
            // it can be ignorded
            // ( the computer can count just fine, thanks )
            break;

        case 2:
            // this is an edge addition
            myGraph[atoi(vtokens[0].c_str())].push_back(atoi(vtokens[1].c_str()));
            myGraph[atoi(vtokens[1].c_str())].push_back(atoi(vtokens[0].c_str()));
            break;

        case 3:
            // this is a query
            if (vtokens[0] == "1")
                myNodeValues[atoi(vtokens[1].c_str())] = atoi(vtokens[2].c_str());
            else 
                getPathSum(atoi(vtokens[1].c_str()), atoi(vtokens[2].c_str()));
            break;

        default:
            // this is an error
            throw std::runtime_error(
                "Bad input");
        }
    }
}

void cSolver::getPathSum(int from, int to)
{
    raven::set::cRunWatch aWatcher("getPathSum");
    mySeen.clear();
    mySeen.resize(MAX_INPUT, false);
    myDestination = to;

    std::cout 
        << from << " to " << to
        << " value "<< dfs(from) << "\n";
}
long long cSolver::dfs(int from)
{
    if (mySeen[from])
    {
        return -1;
    }
    mySeen[from] = true;
    if (from == myDestination)
    {
        return myNodeValues[from];
    }
    // loop over neighbours
    for (int n : myGraph[from])
    {
        long long nsum = dfs(n);
        if (nsum >= 0)
        {
            return myNodeValues[from] + nsum;
        }
    }
    return -1;
}

main(int argc, char* argv[])
{
    raven::set::cRunWatch::Start();
    raven::set::cRunWatch aWatcher("main");

    if( argc != 2 )
        throw std::runtime_error("Bad command line");
    cSolver S;
    S.readInput(argv[1]);

 raven::set::cRunWatch::Report();

    return 0;
}

With your sample input, the output looks like this:

3 to 4 value 10
2 to 5 value 15
4 to 3 value 10
raven::set::cRunWatch code timing profile
Calls           Mean (secs)     Total           Scope
       3        0.000814267     0.0024428       getPathSum

Note that answering the three path queries needs less than 3 milliseconds.

Complete project at https://codeberg.org/JamesBremner/soq79739064

6 Comments

ty for the answer, you're right the DFS can be done better. unfortunately, that's only a constant factor and the complexity is still O(NQ) which isn't fast enough
" isn't fast enough" OK. What is your performance requirement. Please provide a non-trivial example input and the maximum execution time acceptable.
worst case time complexity is the important thing, a solution should work on any possible input within constraints. O(NQ) can go to bigger than 10^10 operations
The important thing is your performance requirement. You have mentioned 1 second. OK, please provide a non trivial example input that you need to run in < 1sec. ( The example you have provided, with three queries, runs in 3 milliseconds )
O(N) on each query is pretty obviously too slow tho. theres already an answer with O(log(N)) per query anyway
|
-1

You have an unordered tree. That means finding a node will take too long if you store it as a tree, so it's better to store the values in an array. However, you can't just throw away the tree structure, because you need it to find the path between two nodes.
You can do that by keeping a second array with the parent of each node, so that you can walk up the tree towards the root, to find the common ancestor of two nodes.
And you also need a way to keep track of which nodes you've visited when you walk up the tree, in order to know when the two paths meet, and you can do that by giving each query its own number and marking visited nodes with that number in a third array.

So, declare three arrays of size 10^5 that can hold 32-bit unsigned integers:

value[], parent[], query[]

and initialize them to contain all zeros.

Then, for each edge [a,b] in the input, set:

parents[b] = a;

This array stores the structure of the tree and will allow you to walk up towards the root.

For each query of type 1 (set node value) where node is a and value is b, set:

value[a] = b;

This array stores the values of the nodes.

For each query of type 2 (get path value) where the nodes are a and b, repeatedly do:

total += value[a];
query[a] = q;
a = parent[a];
total += value[b];
query[b] = q;
b = parent[b];

where q is the number of the query; this array keeps track of which nodes have already been visited for this query.

If you reach a node with no parent, that means you've reached the root node, and thereafter you continue climbing up only for the other node.

If you reach a node whose query number is already set to the current query number, that means you've reached the shared ancestor of both node a and b. Thereafter you continue climbing only for this node, and subtract the values you encounter (because they've been added unnecessarily while climbing from the other node), until you reach the other node.

Building the tree is O(N), a query of type 1 is O(1), and a query of type 2 is O(logN), so the total is O(Q.logN) or better, depending on how many queries are of type 1 and 2.

2 Comments

Care to explain the downvote? Do you think the answer is not correct? Is it too similar to another answer? Is it not explained well enough? Do you think the complexity is higher than O(Q.logN)? Or are you just downvoting all competing answer for the bounty?
(I didn't downvote) Your answer seems to have several mistakes. Mistake 1: your algorithm for type 2 query isn't O(log N) worst case, but at least O(N). Proof: your algorithm is at least O(h) (h being the height of the tree), attained when asking for the sum between the deepest leaf and the root. The input tree has 0 guarantee to be a balanced (like a self-balancing binary search tree), so at worst h=N. Mistake 2: your algorithm to build a tree seems wrong: try 3, 1 2, 3 2. Mistake 3: if type 2 was O(log N), the total would have been O(N+Q.log(N))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.