0

I am trying to implement decision tree/decision rules from scratch in python for my understanding. In particular I am trying to implement takelast algorithm (opposite of takefirst,which states that for each branch take the attributes in the order in which they appear in the training set). I am using lens24 dataset which has 24 instances and the attributes are age, spect, astig, tear and label respectively, where label contains the class labels. In order to identify the decision rules i have grouped the instances first in terms of tear, then astig, then spect and finally age. The returned results are then saved in a list of lists where each individual list represents the values of each attribute for forming a particular branch of tree.

newfile="lenses.data"
df = pd.read_table(newfile, delimiter=r"\s+", header=None)
df.columns =["id",  "age", "spect", "astig", "tear", "label"    ]
gf = df.groupby([ "tear",  "astig", 'spect', 'age'], as_index=False)

    for name,group in gf:
        print(group)
        rule = group.values.tolist()[0]
        ruleset.append(rule)
    print("ruleset")
    print(ruleset)

Here is the output of the ruleset

[[1, 1, 1, 1, 3], [2, 1, 1, 1, 3], [3, 1, 1, 1, 3], [1, 2, 1, 1, 3], [2, 2, 1, 1, 3], [3, 2, 1, 1, 3], [1, 1, 2, 1, 3], [2, 1, 2, 1, 3], [3, 1, 2, 1, 3], [1, 2, 2, 1, 3], [2, 2, 2, 1, 3], [3, 2, 2, 1, 3], [1, 1, 1, 2, 2], [2, 1, 1, 2, 2], [3, 1, 1, 2, 3], [1, 2, 1, 2, 2], [2, 2, 1, 2, 2], [3, 2, 1, 2, 2], [1, 1, 2, 2, 1], [2, 1, 2, 2, 1], [3, 1, 2, 2, 1], [1, 2, 2, 2, 1], [2, 2, 2, 2, 3], [3, 2, 2, 2, 3]]

Now the next step is to prune the ruleset so that some redundant branches are merged. For example, [1,1,1,1,3] and [2,1,1,1,3] only differ in the value of age (where the list represents [age, spect, astig, tear, label]) so these two rules can be merged such that the resultant rule will be (x, 1,1,1,3) where x shows the absence of an attribute. I want to recursively prune the ruleset such that in the first step all the rules which only differ for values of age are combined. In the next step those that differ only for spect are combined and so on. For the first step, I have used the following code:

    for i in range(len(ruleset)-1):
        if ruleset[i][1] == ruleset[i+1][1] and ruleset[i][2] == ruleset[i+1][2] and ruleset[i][3] == ruleset[i+1][3] and ruleset[i][4] == ruleset[i+1][4]:

            print("combining: ", ruleset[i], " and ", ruleset[i+1])
            ruleset[i][0] = ruleset[i+1][0]= -1

    print("after pruning ruleset")
    print(ruleset)
    minruleset = []
    for elem in ruleset:
        if elem not in minruleset:
            minruleset.append(elem)
    print("ruleset: ", len(ruleset))
    print("minruleset: ", len(minruleset))
    print(minruleset)
    print("------------------------")
    ruleset = minruleset

The original ruleset which contained 24 rules is now minimized to 10 rules which are as follows, where -1 shows that age doesnot matter so we will not consider it further.

[[-1, 1, 1, 1, 3], [-1, 2, 1, 1, 3], [-1, 1, 2, 1, 3], [-1, 2, 2, 1, 3], [-1, 1, 1, 2, 2], [3, 1, 1, 2, 3], [-1, 2, 1, 2, 2], [-1, 1, 2, 2, 1], [1, 2, 2, 2, 1], [-1, 2, 2, 2, 3]]

Now I want to do this pruning step recursively in a way that in the first iteration I have excluded age in the comparison (index 0) in the next step i want to exclude age and spect (index 0, 1) and then excluded age, spect, astig (index 0, 1, 2) and so on. At the end of each pruning iteration, the stopping criteria of pruning will be the length of reduced ruleset formed at the end of pruning versus the ruleset at the start of the particular iteration of pruning. What I am unable to do is to specify the range of index (0-4) in the if statement where the comparison is made for the values of age, spect, astig, tear and label for each instance with the next instance and repeat this process recursively. Any help would be highly appreciated.

1 Answer 1

1

I have found the answer myself. Posting it here to help anyone with a similar issue in the future. The if statement needs to be updated as follows

if all (ruleset[i][j] == ruleset[i+1][j] for j in range(col,features)):

this if statement will be nested inside a while loop which breaks when the stopping criteria is met. After each iteration the col variable will be incremented whereas the features value is fixed e.g. 5 in my case.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.