4

I am new to Tree structures. I am trying to design an algorithm which takes as input a predetermined tree. It should then seed the root of the tree with an object of my own design (I don't think it's relevant to the design of the algorithm, but the object is an ArrayList), and uses a "change operator" to modify the object along each branch of the tree. The number of passes of the change operator is dependent upon the branch length. The object must be evaluated at each node of the tree, and this node value is then used to evaluate the values of its child-nodes. Therefore, I cannot, simply evaluate the value at each tip using the total length from root to tip, as the change operator I am using is stochastic, rather than deterministic. Therefore the value of each child node is dependent upon the value of its parent node.

I have attempted to design this myself. I started by creating an array of times at which a single branch would split into its children.

int[] times = new int[branchnumber];
    times[0] = 1;
    times[1] = 5;

Then, I created a method which took the times at which branching should occur (which I interpreted as a branch length), the ArrayList object, the current time, and the total number of branches.

public static void Brancher(int[] times, List<double[][]> sequences, int t){
    boolean checker = false;
    for (int i = 0; i < times.length; i++) {
        if (times[i] == t) {
            checker = true;
        }
    }
    if (checker == true) {
        double[][] seq = sequences.get(sequences.size() - 1);
        sequences.add(seq);
    }
}

The Brancher method is implemented as below:

for (int j = 0; j <= loopnumber; j++) {
        MathsOperators.Brancher(times, sequences, t);

        for (int i = 0; i < sequences.size(); i++) {
            double[][] sequenceholder = sequences.get(i);
            MathsOperators.PrintOutput(sequenceholder, t);
            ComplexInput.evolvesequence(sequenceholder, frequencies, transitionrate, transversionrate, random);
            sequences.set(i, sequenceholder);
        }

        t = t + dt;
    }

So, I hold the current state of all objects in the tree as Arrays held in the sequences ArrayList. These objects are then acted upon by the evolve method, and the updated objects replace the originals in the ArrayList. When a branch is to be added, the Brancher method takes the last Array object in the ArrayList, copies it, and adds the copy to the list. This has in effect simulated the splitting of the last object into two objects. The copy is then updated and evolved in the next iteration of the simulation loop.

This method of doing things, while not pretty, results in trees that look like this: http://content.science20.com/files/Tree3A.jpg . This is a relatively uncomplicated structure.

However, it is impossible to create a tree of this shape: http://content.science20.com/files/Tree4.jpg . This tree has multiple splits to the far right. I don't know the exact terminology to describe the differences between the trees here, but its reasonably obvious.

I think I am confusing myself here. Any advice about how to think about this problem would be greatly appreciated.

(If it helps with context, the input object is a genetic sequence (ACGT). The idea is to evolve the sequence along each branch of the tree, with each tip representing the descendant of the original ancestor (input) sequence)

EDIT

The input object is an ArrayList, containing multiple (n x 2) Arrays of doubles. The first column of each array contains n integers from the set {1,2,3,4}, with frequencies I control. However, the order of these characters in the column is random. Each integer represents one of the characters A,C,G,T in a DNA sequence. The second column contains a double representing the rate of evolution for each DNA site, so one rate entry for each integer entry. The initial Array is generated using a function I call DNASEQ, it's very long and doesn't make a difference to this question.

To start, I generate one of these Arrays, and add it to an ArrayList called sequences. Using the simulation loop shown above, I then apply the evolvesequence method to each Array in the ArrayList.

When Brancher detects that the time, which is updated after every loop in increments of 1, is equal to one of the times of branching specified, then it takes the last Array in the ArrayList and adds a copy of the array to the Arraylist. If the simulation stopped there, the Array would be identical to the one it was copied from. However, now that it is in the ArrayList, it is subject to the evolvesequence method. Since evolvesequence is a stochastic/probabilistic method, then for a given input, no two outputs are guaranteed to be identical. So, after a few iterations of the simulation loop, the copied Array will be very different from the original.

The original sequence is displayed, and then copied by Brancher. Now I have two sequences of identical origin, evolving independently. This is the same as a branch of a tree.

12
  • I don't really get what you're trying to do here, but wouldn't a real tree data structure make more sense, see e.g. here: stackoverflow.com/questions/3522454/java-tree-data-structure? I don't really understand you code, e.g., incrementing branchcount seems to have no effect...? Perhaps you might want to provide more code? Commented Jul 30, 2015 at 16:33
  • Maybe I didn't explain this properly. The tree is supposed to serve as a guide or pathway, down which the "evolution operator" runs. It takes the value of the object at the root node, evolves it down both branches, and evaluates the modified object at the ends of the branches. Then, it should take the new values for each node, and follow whatever branches are specified below those nodes, and so on until it reaches the tips of the tree. My branching algorithm can only result in a tree of this shape, content.science20.com/files/Tree3A.jpg, as opposed to a more complex shape, with subtrees Commented Jul 30, 2015 at 16:56
  • An example of what I mean by more complex: content.science20.com/files/Tree4.jpg. Sorry, I don't know the exact terminology of what I'm trying to express. Commented Jul 30, 2015 at 17:01
  • I've tried to expand upon my original post, and added another code snippet. Does this help with your understanding of my aim? Commented Jul 30, 2015 at 17:17
  • Can you perhaps give a small example of an input and the desired output? Doesn't have to include all data of the output, i think, perhaps you can simplify the output to be a simple number or string? And/or perhaps you can map your terminology to the Tree3A.jpg you provided, i.e., what are times, branch, tip, or node in this image? How is sequences populated initially? And: I would have to try it, but I'm pretty certain that by getting an array from the list and adding it again, you have the exact same object twice in your list, and not a copy of it - is this what you really want? Commented Jul 31, 2015 at 6:54

1 Answer 1

1

With an isolated view on your specific task, it may be beneficial to implement a very special algorithm for your very special tree.

However, as you can see in the comments, this makes it way harder for outsiders to understand your code, yet alone help you. Therefore, I suggest you use this structure: Java tree data-structure?.

The benefits are that this is a well-known structure and everybody with some knowledge about data structures should be able to help you. The structure is generic enough that it should be able to reflect all variants of trees you're trying to build.

Moreover, it seems that you're constructing the tree and at the same time you populate it with data, is this correct? This occurs to me just now. It doesn't look like these two steps are dependent on each other, so you could first create your tree and then traverse it in a second loop in order to populate it with data? If you can separate these two steps, your code will become much clearer and easier to understand.

If you just want a binary tree (i.e., a tree having at most two children at each node), you could even simplify the structure by not having an ArrayList<Node<T>> as children, but a leftChild and a rightChild.

Afterwards, you set the data of the root node to your initial double[][], then you traverse the tree in the order appropriate to your task and modify each node's data. From each node, you can access its children and its parent, i.e., from every node in the tree, you can access the complete information stored in the tree, which should provide you with all necessary information to update a node's data. Of course, you'd have to visualize the tree and know exactly on which node you are right now, what information you need and how you can access it (i.e., how to navigate to the information you need).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.