The problem(s)
There are some problems with the code you wrote, which lie in false assumptions, though I can't know if you made them purposely.
- In Python, the last statement executed is not returned (unlike Ruby for example). In your function
_closest, the code will almost always enter the branch if dist is None right on the first call. The only cases where it doesn't is if the root of the Tree is None or is the exact value you are looking for. Depending of the branch it enters, the last statement executed inside this loop will be self._closest(val, node.l, node.v, dist), self._closest(val, node.r, node.v, dist) or dist = abs(val - node.v). As nothing is never returned, if the first call to _closest goes through that branch, then None will be returned by your function (no matter what happens in nested _closest calls). So, you do not have to return the result only when reaching a leaf or when the value is a perfect match, you have to return the result all the way to the top, at each step.
Variable scoping is off : you never update your value of the shortest distance (or rather, not the way you intend to). In essence, what you are doing is this:
def f1(dist):
f2(dist)
return dist
def f2(dist):
dist = 1
print f1(10)
This code prints 10. Changing the value of dist inside either f1 or f2 does affect only the value of dist inside the function. So, when changing this value inside f2 as above does not affect the value of dist inside f1. In both f1 and f2, the parameter dist has a local scope. That's what you are doing in _closest. When you find a Node that is closer to the value you are looking for, you change the value of dist and close locally, in the current call of the function. But changing them in one call of the function does not propagate the modifications to other instances of dist and close of above calls.
Fixing the code
So, there are two fixes to do that are intertwined. First, you need to update the value of dist and close with the results of sub-calls to _closest. In order to do that, you need an instance of _closest that will always return something.
Apparently, the return format you chose is 'value searched', 'closest value found', 'distance between values' so we'll do with that for the moment.
Always returning something is easy : just add a return statement at the end of each branch. As for the update of dist and close, it is straight-forward and the following pattern will be recurrent in the final function:
if node.l is not None:
_, close_l, dist_l = self._closest(val, node.l, node.v, dist)
if dist_l < dist:
dist = dist_l
close = close_l
In the end, the function is the following:
def closest(self, val):
if self.root is not None:
return self._closest(val, self.root)
else:
return val, None, None
def _closest(self, val, node, close=None, dist=None):
if node.v is not None:
if val == node.v:
print 'Value Exists in Tree'
return val, node.v, 0
else:
if dist is None:
dist = abs(val - node.v)
close = node.v # Close must be initialized just like dist here
if node.l is not None:
_, close_l, dist_l = self._closest(val, node.l, node.v, dist)
if dist_l < dist:
dist = dist_l
close = close_l
if node.r is not None:
_, close_r, dist_r = self._closest(val, node.r, node.v, dist)
if dist_r < dist:
dist = dist_r
close = close_r
return val, close, dist
else:
if abs(val - node.v) < dist:
dist = abs(val - node.v)
close = node.v
if node.l is not None:
_, close_l, dist_l = self._closest(val, node.l, close, dist)
if dist_l < dist:
dist = dist_l
close = close_l
if node.r is not None:
_, close_r, dist_r = self._closest(val, node.r, close, dist)
if dist_r < dist:
dist = dist_r
close = close_r
return val, close, dist
else:
return val, close, dist
This code works and produces the right result. However, it is overly complex and hardly understandable. In fact, we can make it better, we'll see how right away.
First things first, we there are four return statements, which all do the same thing : returning val, close, dist. Rather than returning at the end of each branch, we can simply return at the end of the function, it will have the same effect. We can change:
def _closest(self, val, node, close=None, dist=None):
if node.v is not None:
if val == node.v:
return val, node.v, 0
else:
if dist is None:
# ...
return val, close, dist
else:
# ...
return val, close, dist
else:
return val, close, dist
To:
def _closest(self, val, node, close=None, dist=None):
if node.v is not None:
if val == node.v:
close, dist = node.v, 0
else:
if dist is None:
# ...
else:
# ...
return val, close, dist
Second thing, we can cut one of the big branches in order to remove code duplication. At the moment, here is what we have:
if dist is None:
dist = abs(val - node.v)
close = node.v
# Compute _closest for right and left node, update dist and close accordingly
else:
if abs(val - node.v) < dist:
dist = abs(val - node.v)
close = node.v
# Compute _closest for right and left node, update dist and close accordingly
This can be trivially rewrote as:
if dist is None or abs(val - node.v) < dist:
dist = abs(val - node.v)
close = node.v
# Compute _closest for right and left node, update dist and close accordingly
For the moment we have:
def closest(self, val):
if self.root is not None:
return self._closest(val, self.root)
else:
return val, None, None
def _closest(self, val, node, close=None, dist=None):
if node.v is not None:
if val == node.v:
close, dist = node.v, 0
else:
if dist is None or abs(val - node.v) < dist:
dist = abs(val - node.v)
close = node.v
if node.l is not None:
_, close_l, dist_l = self._closest(val, node.l, close, dist)
if dist_l < dist:
dist = dist_l
close = close_l
if node.r is not None:
_, close_r, dist_r = self._closest(val, node.r, close, dist)
if dist_r < dist:
dist = dist_r
close = close_r
return val, close, dist
So, this is much better than the first version : shorter, easier to read, easier to understand, less duplicate code. This is a good version already. Can we do better ? Yes, if we make slight changes to the way we traverse the tree.
For the moment, the bulk of the algorithm is:
- If the current node is closer than the closest so far, update the closest;
- If the right child exists and is closer than the closest so far, update the closest;
- If the left child exists and is closer than the closest so far, update the closest.
What appears is that every node except the root will be studied twice : the first time it will be studied as the child of the current node, the next it will be studied as the current node. (Note that even though they are studied twice, the function properly traverses each node at most once). So, the modification to make is to study a node only if it is the current node, without caring about children. Unfortunately, this is only possible if we change the return value of the _closest function. Instead of returning a tuple of three numbers, we will share a dictionary across calls : each iteration will update the dictionary if the studied node is closer than the other. As the state of the computation is kept in a shared state (the dictionary), there is no need pass the current closest point and distance through return statements. Here is the code:
def closest(self, val):
best = {"close": None, "dist": None}
if self.root is not None:
self._closest(val, self.root, best)
return val, best["close"], best["dist"]
def _closest(self, val, node, best):
if node is not None and node.v is not None:
if best["dist"] is None or abs(val - node.v) < best["dist"]:
best["dist"] = abs(val - node.v)
best["close"] = node.v
self._closest(val, node.l, best)
self._closest(val, node.r, best)
This method is similar to what would be done in a language like C: Because returning several values is not an option, a container structure is used to wrap the several results to return.
Discussion
What I meant by best for this problem was the shortest solution and the easiest one to read and understand. I believe this is the case of the last one (though it can be just a personal personal preference). However shared states are not everyone's taste.
Random thoughts
Finally, a few points about the code you presented first:
- You checked nullity of a Node in three different ways in your code :
if dist is None, if node.l, if node.v != None. It is best to have coherence across your code and to use only one way of doing. The idiomatic way to do is the check is None (and its opposite is not None)
- You use Python2, which is well-advanced in its End of Life. There is adherence to Python2 in certain communities (mainly because of an ecosystem which is built on top of it (e.g. infosec)). However, if you have the choice between Python 2 and Python 3 for a new project, you should really go with Python 3.
The end of this answer, hope this helped. Do not hesitate to ask question for clarification if something was unlcear or if I made a mistake along the way.