0

Given the following CSV, which represents a generalization hierarchy (think: zip-code anonymization e.g. in the second step the zip code 46072 becomes 460** ):

A1, A*, *
A2, A*, *
A3, A*, *
B1, B*, *
B2, B*, *
B3, B*, *
B4, B*, *

I create an array of arrays by parsing it first.

I would now like to turn this into a tree representation:

                *

        /                \

      A*                 B*

   /  |   \         /   |   |   \

 A1   A2   A3     B1   B2   B3  B4

As you can see it is a tree with each node having an arbitrary amount of children.

I have the following classes:

Table, TableRow, TableCell as well as Tree and Node. Obviously a table has multiple rows, which in turn have multiple cells. A tree has a root node and a node has various operations such as addChild(node), getParent(), getChildren() etc.

I'm trying to figure out, how to iterate over my table, in order to span a tree as illustrated above. So far I've only driven myself into confusion...

Help is much appreciated!

2
  • Is your CSV always going to be just 3 columns? And is the right column always going to be '*'? If not, please show a usage case where this is not so and what you would expect in the root node in such a case. Commented Oct 18, 2014 at 21:35
  • The amount of colums can vary. In the case of a zip code for instance it is 46072,4607*,460**,46***,4****,* - Also the right column is to my knowledge always a symbol or term representing anything. Mind you though, that the way the hierarchy works is specified by the CSV. I.e. it can also be: caucasian, white, human for instance, with human being the right most entry in the rows. Commented Oct 18, 2014 at 21:42

1 Answer 1

2

OK. So, I'm basing my answer on these assumptions:

  1. The rightmost column of the matrix always has the same value throughout.
  2. All rows have exactly the same number of columns.
  3. Same values in the same column are in consecutive rows. That is, there will not be an "A*" row followed by a "B*" and then again by "A*". The two "A*" have to be consecutive.
  4. On the leftmost column, all values are unique.

I did not know exactly what are your classes Table, Tree and Node can or cannot do. So I worked with a base 2d array (which you also said is what you have after parsing), and used just a rudimentary Node as my tree structure.

The idea is to work recursively from the head matrix. Recursion and tree go together well...

The tree is defined as having the value in the rightmost column as its root value, and its children are created the same way by eliminating the rightmost column, and cutting the matrix into pieces such that those pieces' rightmost column has an identical value. The matrix

A1, A*, *
A2, A*, *
A3, A*, *
B1, B*, *
B2, B*, *
B3, B*, *
B4, B*, *

Is split into a value "*" and the two sub-matrices:

A1, A*
A2, A*
A3, A*

B1, B*
B2, B*
B3, B*
B4, B*

The same is done for the A* matrix, and its sub-matrices are single-cell matrices, A1, A2 and A3, at which point the recursion ends.

So assuming you created a class that represents a hierarchy builder, and you have a 2D array called data in it, you'll have a nice public method without parameters, that calls a "dirty" private method that has parameters representing the boundaries of the matrix for the current sub-matrix.

public Node<String> createTree() {
    return this.createTree(0,data.length-1,data[0].length-1);
}

The arguments it passes to the private method are the top row, bottom row, leftmost column and rightmost column. Only since in this case a sub-matrix always starts from column 0, we don't need to pass the leftmost column as parameter.

And this is your private createTree method:

private Node<String> createTree( int firstRow, int lastRow, int lastCol) {

    // Recursion end. If we are at the leftmost column, just return a simple
    // node with the value at the current row and column.
    if ( lastCol == 0 ) {
        return new Node<String>( data[firstRow][0] );
    }

    // Create a node with the value of the top-right cell in our range.
    Node<String> result = new Node<String>(data[firstRow][lastCol]);

    // The next column from the right will have the values for the child nodes.
    // Split it into ranges (start row -> end row) and recursively build
    // the tree over the sub-matrix that goes column 0 -> lastCol-1 over each
    // range of rows.

    int childFirstRow = firstRow;
    String childVal = data[firstRow][lastCol-1];

    for( int candidateRow = firstRow; candidateRow <= lastRow; candidateRow ++ ) {
        // If the next value in the column is different from what we had so far, it's
        // the end of a row range, build the child tree, and mark this row as
        // the beginning of the next range.
        if ( ! data[candidateRow][lastCol-1].equals(childVal) ) {
            result.addChild(createTree( childFirstRow, candidateRow - 1, lastCol - 1));
            childFirstRow = candidateRow;
            childVal = data[childFirstRow][lastCol-1];
        }
        // In the special case of the last row, it's always the end of a range.
        if ( candidateRow == lastRow ) {
            result.addChild(createTree(childFirstRow,lastRow,lastCol - 1));
        }
    }

    return result;
}
Sign up to request clarification or add additional context in comments.

1 Comment

Brilliant, just what I needed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.