2

I want to compute the Zhang-Shasha tree-edit distance between 2 trees (zss library). However, my trees are in the form of networkx graphs (they actually represent DOM html trees). The example in the zss documentation shows how to create a tree by hand:

from zss import *
A = (
    Node("f")
        .addkid(Node("a")
            .addkid(Node("h"))
            .addkid(Node("c")
                .addkid(Node("l"))))
        .addkid(Node("e"))
    )
zss.simple_distance(A, A) # [0.0]

Which would be the same tree as:

import networkx as nx
G=nx.DiGraph()
G.add_edges_from([('f', 'a'), ('a', 'h'), ('a', 'c'), ('c', 'l'), ('f', 'e')])

so I would like to convert tree objects of networkx class into a zss Node object, then compute the edit distance between 2 trees.

Thanks

(and do not hesitate to tell me if you think this is a XY problem)

1
  • Seems to me like depth first traversal. Commented Jan 18, 2019 at 13:36

1 Answer 1

2

Using dfs_tree can definitely help:

import zss
import networkx as nx

G=nx.DiGraph()
G.add_edges_from([('f', 'a'), ('a', 'h'), ('a', 'c'), ('c', 'l'), ('f', 'e')])
T = nx.dfs_tree(G, source='f')
nodes_dict = {}
for edge in T.edges():
    if edge[0] not in nodes_dict:
        nodes_dict[edge[0]] = zss.Node(edge[0])
    if edge[1] not in nodes_dict:
        nodes_dict[edge[1]] = zss.Node(edge[1])
    nodes_dict[edge[0]].addkid(nodes_dict[edge[1]])

print(zss.simple_distance(nodes_dict['f'], nodes_dict['f'])) # 0.0

In case we don't know which node is G's root node, but know we have a valid tree, we can get the source node by calling:

source = [n for (n, d) in G.in_degree() if d == 0][0]
T = nx.dfs_tree(G, source=source)

Since the root is the only node with no incoming nodes, that should work.

Sign up to request clarification or add additional context in comments.

2 Comments

thanks, you supply the source argument as the top node, i guess? How would I automatize this in a function, i mean could I initialize it as T = nx.dfs_tree(G, source=list(G.nodes)[0]), right?
That would work, as well as simply omitting the source argument and calling nx.dfs_tree(G). However, it assumes G has a specific node ordering, which is true in the above example, but depends on the way we build it. Added an edit to handle this step in the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.