6
\$\begingroup\$

I never remember without looking how to write RB or AVL balancing. but I wanted to make sure I would be able to still write a reasonable dynamically balanced binary tree. If it is asked of me in interviews.

So here is my attempt at it.

#include <cassert>
#include <iostream>
#include <vector>

template<typename K, typename V>
class BalancedBinaryTree {
  struct Node {
    K key;
    V val;
    int depth;
    Node* left;
    Node* right;
  };
  
  Node* root = nullptr;

  Node* try_lookup(Node* n, K key) {
    if (!n || n->key == key)
      return n;
    if (key > n->key)
      return try_lookup(n->right, key);
    return try_lookup(n->left, key);
  }

  struct Depth2 {
    int left;
    int right;
    int max() { return std::max(left, right); }
  };

  int depth1(Node* n) {
    if (!n)
      return 0;
    return n->depth;
  }

  Depth2 depth2(Node* n) {
    if (!n)
      return {0, 0};
    return {1 + depth1(n->left), 1 + depth1(n->right)};
  }

 /*
    A
   / \
  B   C
     / \
    D   E
 to
      C
     / \
    A   E
   / \
  B   D
*/
  void rotate_left(Node **at) {
    Node* n = *at;
    Node *new_root = n->left;
    n->left = n->left->right;
    new_root->right = n;
    n->depth = depth2(n).max();
    new_root->depth = depth2(new_root).max();
    *at = new_root;
  }

  // The other side
  void rotate_right(Node **at) {
    Node *n = *at;
    Node *new_root = n->right;
    n->right = n->right->left;
    new_root->left = n;
    n->depth = depth2(n).max();
    new_root->depth = depth2(new_root).max();
    *at = new_root;
  }

  void local_rebalance(Node** at) {
    Node* n = *at;
    Depth2 left = depth2(n->left);
    Depth2 right = depth2(n->right);

    int diff = left.max() - right.max();
    if (diff >= -1 && diff <= 1) {
      Depth2 res = {1 + left.max(), 1 + right.max()};
      n->depth = res.max();
      return;
    }

    if (left.max() > right.max())
      return rotate_left(at);
    else
      return rotate_right(at);
  }

  template <typename FuncT> void update_impl(Node **at, K key, FuncT func) {
    Node *n = *at;

    if (!n || n->key == key)
      return func(at);

    if (key > n->key)
      update_impl(&n->right, key, func);
    else
      update_impl(&n->left, key, func);
    local_rebalance(at);
  }

  void release_from_tree(Node **at) {
    Node *n = *at;

    auto remove_and_replace = [&](Node* by) {
      delete n;
      *at = by;
      return;
    };

    if (!n->right)
      return remove_and_replace(n->left);
    if (!n->left)
      return remove_and_replace(n->right);
    rotate_left(at);
    Node *new_root = *at;
    release_from_tree(&new_root->left);
    local_rebalance(at);
  }

public:
  bool insert(K key, V val) {
    bool res = false;
    update_impl(&root, key, [&](Node **at) {
      assert(at);
      Node *n = *at;
      if (n) {
        res = false;
        n->val = std::move(val);
        return;
      }
      res = true;
      *at = new Node{key, std::move(val), 1, nullptr, nullptr};
    });
    return res;
  }

  bool remove(K key) {
    bool res = false;
    update_impl(&root, key, [&](Node **at) {
      if (!(*at))
        return;
      res = true;
      release_from_tree(at);
    });
    return res;
  }

  V lookup(K key) {
    Node* at = try_lookup(root, key);
    assert(at);
    return at->val;
  }

  static void printImpl(Node* n, int indent, std::vector<bool>& has_edge) {
    if (!n)
      return;
    auto addIndent = [&]{
      for (int i = 0; i < indent; i++) {
        if (i < (indent - 1)) {
          if (!has_edge[i])
            std::cout << "  ";
          else
            std::cout << "| ";
        } else {
          if (!has_edge[i])
            std::cout << ",-";
          else
            std::cout << "`-";
          has_edge[i] = !has_edge[i];
        }
      }
    };
    has_edge.resize(indent + 1, 0);
    has_edge[indent] = 0;
    printImpl(n->left, indent + 1, has_edge);
    addIndent(); std::cout << n->depth << " [" << n->key << "] = " << n->val << std::endl;
    has_edge[indent] = 1;
    printImpl(n->right, indent + 1, has_edge);
  }

  void print() {
    std::vector<bool> has_edge;
    printImpl(root, 0, has_edge);
    std::cout << std::endl;
  }

  int verify_impl(Node* n) {
    if (!n)
      return 0;
    int l = verify_impl(n->left);
    int r = verify_impl(n->right);
    int d = std::max(l, r) + 1;
    assert(n->depth == d);
    return d;
  }

  void verify_depth() {
    verify_impl(root);
  }
};


int main() {
  BalancedBinaryTree<int, int> tree;
  int size = 20;
  for (int i = 0; i < size; i++) {
    tree.insert(i, i);
    tree.print();
    tree.verify_depth();
    for (int k = 0; k <= i; k++)
      assert(tree.lookup(k) == k);
  }
  for (int i = 0; i < size; i++) {
    tree.remove(i);
    tree.print();
    tree.verify_depth();
    for (int k = i + 1; k < size; k++)
      assert(tree.lookup(k) == k);
  }
}

It took me about 1h to write the code, and then I cleanup the code to make it presentable. Would that be acceptable dynamically balanced binary tree even if its its not a RB or AVL tree ?

\$\endgroup\$
2
  • 2
    \$\begingroup\$ RB or AVL aren't the most simple balanced binary search trees. \$\endgroup\$ Commented Nov 12 at 20:58
  • 2
    \$\begingroup\$ Please do not edit the question, especially the code, after an answer has been posted. Changing the question may cause answer invalidation. Everyone needs to be able to see what the reviewer was referring to. What to do after the question has been answered. \$\endgroup\$ Commented Nov 13 at 0:35

2 Answers 2

8
\$\begingroup\$

Some notes.

Indentation

I would strongly suggest four spaces rather than two for each level of indentation. With such minimal indentation, your code is a bit hard to follow.

You may get value out of an automatic formatting tool.

Pointers and memory management

There's nothing wrong with using raw pointers (vs. smart pointers) to implement your data structures. But if you're going to do that, you need to read up on the rule of 3/5/0 regarding copying, moving, and destructors. Not following this will make your code break in some really exciting frustrating ways down the road, especially as you try to use your code with other containers in the standard library.

Comments

There's bad ways to use comments. You've avoided those by using no comments at all, but in doing so you've avoided all of the good things you can use comments for.

printImpl

  static void printImpl(Node* n, int indent, std::vector<bool>& has_edge) {
    if (!n)
      return;
    auto addIndent = [&]{
      for (int i = 0; i < indent; i++) {
        if (i < (indent - 1)) {
          if (!has_edge[i])
            std::cout << "  ";
          else
            std::cout << "| ";
        } else {
          if (!has_edge[i])
            std::cout << ",-";
          else
            std::cout << "`-";
          has_edge[i] = !has_edge[i];
        }
      }
    };
    has_edge.resize(indent + 1, 0);
    has_edge[indent] = 0;
    printImpl(n->left, indent + 1, has_edge);
    addIndent(); std::cout << n->depth << " [" << n->key << "] = " << n->val << std::endl;
    has_edge[indent] = 1;
    printImpl(n->right, indent + 1, has_edge);
  }
  • Your else branch that contains another conditional and an unconditional statement doesn't have to be nested this way.
  • You should not combine lines as you've done with addIndent();
  • You should use a consistent brace style.
  • Don't be afraid of using ternary expressions where they're appropriate.
  • Know your operator precedence. i < (indent - 1) doesn't need the parentheses since - has higher precedence than <.
  static void printImpl(Node* n, int indent, std::vector<bool>& has_edge) {
    if (!n) {
      return;
    }
    auto addIndent = [&]{
      for (int i = 0; i < indent; i++) {
        if (i < indent - 1) {
          std::cout << (has_edge[i] ? "| " : "  ");
        } 
        else if (!has_edge[i]) {
          std::cout << ",-";
          has_edge[i] = !has_edge[i];
        }
        else {
          std::cout << "`-";
          has_edge[i] = !has_edge[i];
        }
      }
    };
    has_edge.resize(indent + 1, 0);
    has_edge[indent] = 0;
    printImpl(n->left, indent + 1, has_edge);
    addIndent(); 
    std::cout << n->depth << " [" << n->key << "] = " << n->val << std::endl;
    has_edge[indent] = 1;
    printImpl(n->right, indent + 1, has_edge);
  }

We might even use the ternary expression again.

  static void printImpl(Node* n, int indent, std::vector<bool>& has_edge) {
    if (!n) {
      return;
    }
    auto addIndent = [&]{
      for (int i = 0; i < indent; i++) {
        if (i < indent - 1) {
          std::cout << (has_edge[i] ? "| " : "  ");
        } 
        else {
          std::cout << (has_edge[i] ? "`=" : ",-");
          has_edge[i] = !has_edge[i];
        }
      }
    };
    has_edge.resize(indent + 1, 0);
    has_edge[indent] = 0;
    printImpl(n->left, indent + 1, has_edge);
    addIndent(); 
    std::cout << n->depth << " [" << n->key << "] = " << n->val << std::endl;
    has_edge[indent] = 1;
    printImpl(n->right, indent + 1, has_edge);
  }

We could question whether that check for whether i is less than indent - 1 should even occur in the loop. There's only one time during this loop when that is false, so the else branch only occurs once. It may be worthwhile to unroll this loop.

      if (indent <= 0) {
        return;
      }

      int i;

      for (i = 0; i < indent - 1; i++) {
        std::cout << (has_edge[i] ? "| " : "  ");
      } 
     
      std::cout << (has_edge[i] ? "`=" : ",-");
      has_edge[i] = !has_edge[i];

As for whether printImpl should even exist... Probably not. Rather than hard-coding std::cout as an output, we should override << for trees and output streams.

Nomenclature

I have always seen the word "height" used with regards to binary search trees. "Depth" is an odd term to see for the same thing.

Comparators

You haven't offered any way to provide a custom comparator function to your class, opting instead for <. Instead you can provide the comparator as a template argument with std::less as a default.

In fact, you've implemented your binary search tree as essentially a map, but this isn't necessary. If we allow for custom comparators, then a map can be implemented as a binary search tree that stores key/value pairs but only compares on the key.

Misleading BST in comments

Curiously, you illustrate rotating left with an example that can be a bit misleading, given that B is typically seen as greater than A and D is typically seen as greater than C. At first glance I thought the resulting BST was invalid.

 /*
    A
   / \
  B   C
     / \
    D   E
 to
      C
     / \
    A   E
   / \
  B   D
*/

The first version almost reads more like a heap than a bvinary search tree.

Rotations and rebalancing

To test the balance of your tree, I implemented a simple print_rows member function.

  void print_rows() const {
    std::vector<Node *> row;
    row.push_back(root);

    while (true) {
      for (const auto &p : row) {
        if (!p) {
          std::cout << " x";
        }
        else {
          std::cout << " " << p->val;
        }
      }
      std::cout << '\n';

      bool all_null = true;
      std::size_t sz = row.size();
      std::vector<Node *> next_row;
      for (std::size_t i = sz / 2; i < sz; i++) {
        if (!row[i]) {
          row.push_back(nullptr);
          row.push_back(nullptr);
        }
        else {
          all_null = false;
          row.push_back(row[i]->left);
          row.push_back(row[i]->right);
        }
      }

      if (all_null) break;
    }
  }

Now, your testing involves inserting linear numbers into a tree. This is a start, but I tested with a more random sample:

int main() {
  BalancedBinaryTree<int, int> tree;
  std::vector<int> v {5, 1, 3, 9, 7, 6, 0, 13, 17};
  for (const auto x : v) {
    tree.insert(x, x);
    tree.print_rows();
    std::cout << '\n';
  }
}

To which I saw as output:

 5
 5 x x

 5
 5 1 x
 5 1 x x x x x

 1
 1 x 5
 1 x 5 x x 3 x
 1 x 5 x x 3 x x x x x x x x x

 5
 5 1 9
 5 1 9 x 3 x x
 5 1 9 x 3 x x x x x x x x x x

 5
 5 1 9
 5 1 9 x 3 7 x
 5 1 9 x 3 7 x x x x x x x x x

 5
 5 1 7
 5 1 7 x 3 6 9
 5 1 7 x 3 6 9 x x x x x x x x

 5
 5 1 7
 5 1 7 0 3 6 9
 5 1 7 0 3 6 9 x x x x x x x x

 5
 5 1 7
 5 1 7 0 3 6 9
 5 1 7 0 3 6 9 x x x x x x x 13
 5 1 7 0 3 6 9 x x x x x x x 13 x x x x x x x x x x x x x x x x

 5
 5 1 7
 5 1 7 0 3 6 13
 5 1 7 0 3 6 13 x x x x x x 9 17
 5 1 7 0 3 6 13 x x x x x x 9 17 x x x x x x x x x x x x x x x x

Now most of this reflects height balanced binary trees. However, if we look at the following snippet, after inserting 1, 5, and 3...

 1
 1 x 5
 1 x 5 x x 3 x
 1 x 5 x x 3 x x x x x x x x x

This binary tree would look something like:

    1
     \
      5
     /
    3

This is distinctly not height balanced.

I believe this issue originates with your rotation logic, so I ran another test, inserting 7, then 9, then 8. The output I saw:

 7
 7 x x

 7
 7 x 9
 7 x 9 x x x x

 9
 9 7 x
 9 7 x x 8 x x
 9 7 x x 8 x x x x x x x x x x

This binary tree would look something like:

    9
   /
  7   
   \
    8

In your rotations, you're using the root of one of your subtrees as the new root. In the previous case, this would install 7 as the new root value, and then 8 and 9 have to be on the right side.

Instead in the previous case you want the largest value on the left side, which is 8, to be the new root. The new left branch is the old left branch, with 8 removed, and the new right branch is the old right branch with the old root (9) inserted.

This approach gives us:

    8
   / \
  7   9

Testing

You didn't see this in your testing because you were inserting sorted numbers.

  0
   \
    1
     \
      2

When balanced using your logic this uses 1 as the new root, which is correct since it is incidentally the smallest value on the right branch.

  1
 / \
0   2
     \
      3
       \
        4

Now the subtree needs to be balanced, and again, this is very easy and incidentally works.

  1
 / \
0   3
   / \
  2   4
       \
        5

This needs to be balanced. Using your logic:

       3
      / \
     2   4
    /     \
   1       5
  /         
 0           

And the left branch balances correctly with your approach.

       3
      / \
     1   4
    / \   \
   0   2   5 

This pattern of incidental success continues. But it's important to test diverse datasets to separate a correct algorithm and implementation from one that only works under specific circumstances.

Imagine if your local_rebalance function had looped performing rotations until balance was achieved, and you fed it 7, 9, and 8.

   7
    \
     9
    /
   8

Rotates to:

   9
  /
 7
  \
   8

Which would just rotate right back to:

   7
    \
     9
    /
   8

You'd have an infinite loop.

\$\endgroup\$
3
  • 2
    \$\begingroup\$ I am surprised the comments focus so much on style or lack of polish. instead of the core logic. about the print, its here to make debugging easier, it wouldn't be here if it was a polished library. \$\endgroup\$ Commented Nov 12 at 22:39
  • 2
    \$\begingroup\$ The commentary on pointers and comparators go straight to functionality rather than style. \$\endgroup\$ Commented Nov 12 at 23:03
  • 4
    \$\begingroup\$ @Ralender: I always review style first. The reason is simple: hard-to-read code is harder to review, so the first step for the reviewee is to get their code to a point where it's easy for me (as a reviewer) to read. In C++, I will then review safety/soundness second. The reason is again simple: unsound code doesn't work, unsafe code is one step away from being unsound, so rather than figuring out whether every use of unsafe code is sound, and every suggestion I may make will be sound, I'll first ensure the technical blocks are safe & sound. Then, as a third step, I'll review the functionality. \$\endgroup\$ Commented Nov 13 at 10:13
3
\$\begingroup\$

Implement the interface of std::map

To make your code be a drop-in replacement for standard library containers, try to implement the same interface for your BalancedBinaryTree as std::map has. It also makes it easier to use for C++ programmers that are already used to the standard library, and even better, many algorithms from the standard library and even pure language features will then be able to work directly on your BalancedBinaryTree.

For example, if you implement iterators, then you could print the contents of your tree with:

BalancedBinaryTree<…> tree = {…};

for (auto& [key, value]: tree) {
    std::cout << key << ": " << value << "\n";
}

It's not trivial to do this correctly, but the standard containers take care of many details that you did not. For example, what if K and/or V is not a copyable type? In fact, std::map allows you to add nodes that are neither copyable nor movable. What if you try to insert two values with the same key?

Even if you don't plan to implement all this, it might be instructive to just go over all the member functions (and member types) of std::map and see what exactly they do.

Avoid using manual memory management

Chris mentioned:

There's nothing wrong with using raw pointers […]

But actually, there is something wrong with that. It's very easy to get memory leaks when using raw pointers. Your BalancedBinaryTree has one. Consider this piece of code:

{
    BalancedBinaryTree<int, double> tree;
    tree.insert(42, 3.1415);
}

There is no destructor that cleans up the nodes, so the node for key 42 will never be freed in the above example.

Using std::unique_ptr instead of raw pointers would have solved this issue without you needing to write a destructor.

\$\endgroup\$
1
  • \$\begingroup\$ Nice catch of the memory leak. +1. \$\endgroup\$ Commented 2 days ago

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.