sorted (or unsorted) binary tree traversal python

Question

Trying to traverse an unsorted BT to find the closest value to a given input value. The problem is, I can't seem to traverse the whole tree, I only get stuck traversing one side. How do I iterate through both branches? I have seen some examples in Java, but I have trouble understanding. It also seems like my method below is overly verbose?

Input = 5 Find closest value in the tree

**UPDATE: I have changed the code per the suggestion below, and am able to iterate through each branch, I think. But still not returning what I need.

#     3
# 0     4
#   2      8



class Node:
    def __init__(self, val):
        self.l = None
        self.r = None
        self.v = val

class Tree:
    def __init__(self):
        self.root = None
def getRoot(self):
    return self.root

def add(self, val):
    if(self.root == None):
        self.root = Node(val)
    else:
        self._add(val, self.root)

def _add(self, val, node):
    if(val < node.v):
        if(node.l != None):
            self._add(val, node.l)
        else:
            node.l = Node(val)
    else:
        if(node.r != None):
            self._add(val, node.r)
        else:
            node.r = Node(val)

def closest(self,val):
    if self.root!=None:
        return self._closest(val,self.root)
    else:
        return
def _closest(self,val,node,close=None,dist=None):
    if node.v!=None:
        if val==node.v:
            print 'Value Exists in Tree'
            return val,node.v,0
        else:
            #print close
            if dist is None:
                dist = abs(val-node.v)
                if node.l:
                    self._closest(val,node.l,node.v,dist)
                if node.r:
                    self._closest(val,node.r,node.v,dist)
            else:
                if abs(val-node.v)<dist:
                    dist = abs(val-node.v)
                    close = node.v
                    print close,dist
                if node.l:
                    self._closest(val,node.l,close,dist)
                if node.r:
                    self._closest(val,node.r,close,dist)
                if node.r is None and node.l is None:
                    return val,close,dist


tree = Tree()
tree.add(3)
tree.add(4)
tree.add(0)
tree.add(8)
tree.add(2)
tree.closest(5)

If you look at the left, then simply return, then you never get to look at the right. — Kenny Ostrom
– Kenny Ostrom, Commented Sep 2, 2017 at 18:07

T. Claverie · Accepted Answer · 2017-09-04 15:28:20Z

The problem(s)

There are some problems with the code you wrote, which lie in false assumptions, though I can't know if you made them purposely.

In Python, the last statement executed is not returned (unlike Ruby for example). In your function _closest, the code will almost always enter the branch if dist is None right on the first call. The only cases where it doesn't is if the root of the Tree is None or is the exact value you are looking for. Depending of the branch it enters, the last statement executed inside this loop will be self._closest(val, node.l, node.v, dist), self._closest(val, node.r, node.v, dist) or dist = abs(val - node.v). As nothing is never returned, if the first call to _closest goes through that branch, then None will be returned by your function (no matter what happens in nested _closest calls). So, you do not have to return the result only when reaching a leaf or when the value is a perfect match, you have to return the result all the way to the top, at each step.
Variable scoping is off : you never update your value of the shortest distance (or rather, not the way you intend to). In essence, what you are doing is this:
```
def f1(dist):
    f2(dist)
    return dist

def f2(dist):
    dist = 1

print f1(10)
```

This code prints 10. Changing the value of dist inside either f1 or f2 does affect only the value of dist inside the function. So, when changing this value inside f2 as above does not affect the value of dist inside f1. In both f1 and f2, the parameter dist has a local scope. That's what you are doing in _closest. When you find a Node that is closer to the value you are looking for, you change the value of dist and close locally, in the current call of the function. But changing them in one call of the function does not propagate the modifications to other instances of dist and close of above calls.

Fixing the code

So, there are two fixes to do that are intertwined. First, you need to update the value of dist and close with the results of sub-calls to _closest. In order to do that, you need an instance of _closest that will always return something.

Apparently, the return format you chose is 'value searched', 'closest value found', 'distance between values' so we'll do with that for the moment.

Always returning something is easy : just add a return statement at the end of each branch. As for the update of dist and close, it is straight-forward and the following pattern will be recurrent in the final function:

if node.l is not None:
    _, close_l, dist_l = self._closest(val, node.l, node.v, dist)
    if dist_l < dist:
        dist = dist_l
        close = close_l

In the end, the function is the following:

def closest(self, val):
    if self.root is not None:
        return self._closest(val, self.root)
    else:
        return val, None, None

def _closest(self, val, node, close=None, dist=None):
    if node.v is not None:
        if val == node.v:
            print 'Value Exists in Tree'
            return val, node.v, 0
        else:
            if dist is None:
                dist = abs(val - node.v)
                close = node.v # Close must be initialized just like dist here
                if node.l is not None:
                    _, close_l, dist_l = self._closest(val, node.l, node.v, dist)
                    if dist_l < dist:
                        dist = dist_l
                        close = close_l
                if node.r is not None:
                    _, close_r, dist_r = self._closest(val, node.r, node.v, dist)
                    if dist_r < dist:
                        dist = dist_r
                        close = close_r

                return val, close, dist
            else:
                if abs(val - node.v) < dist:
                    dist = abs(val - node.v)
                    close = node.v
                if node.l is not None:
                    _, close_l, dist_l = self._closest(val, node.l, close, dist)
                    if dist_l < dist:
                        dist = dist_l
                        close = close_l
                if node.r is not None:
                    _, close_r, dist_r = self._closest(val, node.r, close, dist)
                    if dist_r < dist:
                        dist = dist_r
                        close = close_r

                return val, close, dist
    else:
        return val, close, dist

This code works and produces the right result. However, it is overly complex and hardly understandable. In fact, we can make it better, we'll see how right away.

First things first, we there are four return statements, which all do the same thing : returning val, close, dist. Rather than returning at the end of each branch, we can simply return at the end of the function, it will have the same effect. We can change:

def _closest(self, val, node, close=None, dist=None):
    if node.v is not None:
        if val == node.v:
            return val, node.v, 0
        else:
            if dist is None:
                # ...
                return val, close, dist
            else:
                # ...
                return val, close, dist
    else:
        return val, close, dist

To:

def _closest(self, val, node, close=None, dist=None):
    if node.v is not None:
        if val == node.v:
            close, dist = node.v, 0
        else:
            if dist is None:
                # ...
            else:
                # ...

    return val, close, dist

Second thing, we can cut one of the big branches in order to remove code duplication. At the moment, here is what we have:

if dist is None:
    dist = abs(val - node.v)
    close = node.v

    # Compute _closest for right and left node, update dist and close accordingly

else:
    if abs(val - node.v) < dist:
        dist = abs(val - node.v)
        close = node.v

    # Compute _closest for right and left node, update dist and close accordingly

This can be trivially rewrote as:

if dist is None or abs(val - node.v) < dist:
    dist = abs(val - node.v)
    close = node.v

# Compute _closest for right and left node, update dist and close accordingly

For the moment we have:

def closest(self, val):
    if self.root is not None:
        return self._closest(val, self.root)
    else:
        return val, None, None

def _closest(self, val, node, close=None, dist=None):
    if node.v is not None:
        if val == node.v:
            close, dist = node.v, 0
        else:
            if dist is None or abs(val - node.v) < dist:
                dist = abs(val - node.v)
                close = node.v

            if node.l is not None:
                _, close_l, dist_l = self._closest(val, node.l, close, dist)
                if dist_l < dist:
                    dist = dist_l
                    close = close_l
            if node.r is not None:
                _, close_r, dist_r = self._closest(val, node.r, close, dist)
                if dist_r < dist:
                    dist = dist_r
                    close = close_r

    return val, close, dist

So, this is much better than the first version : shorter, easier to read, easier to understand, less duplicate code. This is a good version already. Can we do better ? Yes, if we make slight changes to the way we traverse the tree.

For the moment, the bulk of the algorithm is:

If the current node is closer than the closest so far, update the closest;
If the right child exists and is closer than the closest so far, update the closest;
If the left child exists and is closer than the closest so far, update the closest.

What appears is that every node except the root will be studied twice : the first time it will be studied as the child of the current node, the next it will be studied as the current node. (Note that even though they are studied twice, the function properly traverses each node at most once). So, the modification to make is to study a node only if it is the current node, without caring about children. Unfortunately, this is only possible if we change the return value of the _closest function. Instead of returning a tuple of three numbers, we will share a dictionary across calls : each iteration will update the dictionary if the studied node is closer than the other. As the state of the computation is kept in a shared state (the dictionary), there is no need pass the current closest point and distance through return statements. Here is the code:

def closest(self, val):
    best = {"close": None, "dist": None}
    if self.root is not None:
        self._closest(val, self.root, best)

    return val, best["close"], best["dist"]

def _closest(self, val, node, best):
    if node is not None and node.v is not None:
        if best["dist"] is None or abs(val - node.v) < best["dist"]:
            best["dist"] = abs(val - node.v)
            best["close"] = node.v

        self._closest(val, node.l, best)
        self._closest(val, node.r, best)

This method is similar to what would be done in a language like C: Because returning several values is not an option, a container structure is used to wrap the several results to return.

Discussion

What I meant by best for this problem was the shortest solution and the easiest one to read and understand. I believe this is the case of the last one (though it can be just a personal personal preference). However shared states are not everyone's taste.

Random thoughts

Finally, a few points about the code you presented first:

You checked nullity of a Node in three different ways in your code : if dist is None, if node.l, if node.v != None. It is best to have coherence across your code and to use only one way of doing. The idiomatic way to do is the check is None (and its opposite is not None)
You use Python2, which is well-advanced in its End of Life. There is adherence to Python2 in certain communities (mainly because of an ecosystem which is built on top of it (e.g. infosec)). However, if you have the choice between Python 2 and Python 3 for a new project, you should really go with Python 3.

The end of this answer, hope this helped. Do not hesitate to ask question for clarification if something was unlcear or if I made a mistake along the way.

So in your last, most brief code, how does it know if left and/or right child nodes exist? [self._closest(val, node.l, best) self._closest(val, node.r, best)]. Seems like you access the l and r before checking to see if they exist?
@chattrat423 They exist because in the constructor of Node, they are set to None by default. So, node.r and node.l always exist, but can have a null value (None). However, the first check of the function _closest is to check if it has not been called with a None node (if node is not None and ...). If that's the case, we have reached an inexistant node so no modification is done to the closest node and the function just returns without doint anything.

Collectives™ on Stack Overflow

sorted (or unsorted) binary tree traversal python

1 Answer 1

The problem(s)

Fixing the code

Discussion

Random thoughts

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

The problem(s)

Fixing the code

Discussion

Random thoughts

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related