0

I have a simple adjacency list representation of a graph like this

{
    1: [2, 3, 4],
    2: [5],
    3: [6, 9],
    4: [3],
    5: [3],
    6: [7, 8],
    7: [],
    8: [],
    9: []
}

which looks like this

enter image description here

But each node also has a "type ancestry" which tells us what "type" of node it is, and this is completely different from the hierarchy present in the adjacency list above. For example, it would tell us that "node 6 is of type 'BA', and all type 'BA' nodes are type 'B'".

For example, consider this ancestry:

{
    1: [], # no ancestry
    2: ['AA', 'A'], # Read this as "2 is a type AA node, and all type AA nodes are type A nodes"
    3: ['B'], # 3 is directly under type B
    4: [],
    5: ['AA', 'A'],
    6: ['BA', 'B'],
    7: ['BA', 'B'],
    8: ['BA', 'B'],
    9: ['BB', 'B']
}

which when visualized would look like this

enter image description here

However, instead of connecting nodes directly as per the adjacency list, we must use their "representative types" when available, where the representative types of the nodes would be the nodes just below the lowest common ancestor of their type ancestries. When visualized with this adjustment, it would look like this

enter image description here

So what I want to produce programmatically is a "hierarchical/nested" adjacency list for such a visualization, which would look like below. The main idea is to introduce a subtree for each key in the adjacency list (along with the edges field), which would in turn contain its own adjacency list and so forth (recursively).

{1: {'edges': ['A', 'B', 4], 'subgraphs': {}},
 4: {'edges': ['B'], 'subgraphs': {}},
 'A': {'edges': ['B'],
       'subgraphs': {'AA': {'edges': [],
                            'subgraphs': {2: {'edges': [5], 'subgraphs': {}},
                                          5: {'edges': [], 'subgraphs': {}}}}}},
 'B': {'edges': [],
       'subgraphs': {3: {'edges': ['BA', 'BB'], 'subgraphs': {}},
                     'BA': {'edges': [],
                            'subgraphs': {6: {'edges': [7, 8], 'subgraphs': {}},
                                          7: {'edges': [], 'subgraphs': {}},
                                          8: {'edges': [], 'subgraphs': {}}}},
                     'BB': {'edges': [],
                            'subgraphs': {9: {'edges': [], 'subgraphs': {}}}}}}}

What is an elegant way of transforming the original adjacency list + the separate "ancestry" map to produce such a data structure?

8
  • 2
    Show us the code you've written so far, even if it does not yet work correctly, which consumes the adjacency list + ancestry map. Commented Apr 20 at 22:37
  • I don't know "elegant way" - you should start with "any way" Commented Apr 21 at 0:07
  • What would you do if in your ancestry input you have 5: ['B'] instead of 5: ['AA','A']? Which process has created that input? I would redesign the source that produced that data. Do you have a more primitive source for it, that does not have duplication like here? Commented Apr 21 at 7:03
  • Secondly, are you sure about your desired output? It doesn't look handy to me. If for node 4 you find that edges has BA, then what's next? You'd just have to search the whole data structure to find that BA definition... I don't see any efficiency in that. Commented Apr 21 at 7:13
  • @trincot if it is 5: ['B'] instead of 5: ['AA', 'A'] then node 5 would exist in the box for 'B' and the edge would be drawn from box 'A' to 'B'. There would be no direct edge from 2 to 5. Commented Apr 21 at 10:43

2 Answers 2

1

You could take this approach to build the target structure:

  • Create a node (a dict) for each distinct node that can be found in ancestry list, so including the ancestry groups. A node looks like this:

    { 'edges': [], 'subgraphs': {} }
    

    These objects are collected in a dict, keyed by their name (e.g. key could be 1 or 'AA', ...)

  • Populate the 'edges' attribute based on the adjacency input list. Make sure to use the ancestry reference instead when the edge transitions to another ancestry group

  • Populate the 'subgraphs' attribute based on the ancestry input list. While doing that, keep track of the nodes that are not a child of another node: these should be retained for the root node in the data structure.

Here is an implementation:

def transform(adjacency, ancestry):
    # Create the basic object (dict) for each node:
    nodes = { 
        subgraph: { 'edges': [], 'subgraphs': {} } 
            for node, subgraphs in ancestry.items()
            for subgraph in (node, *subgraphs)
    }
    # populate the "edges" attributes of the basic nodes
    for node, children in adjacency.items():
        nodes[node]['edges'] = [
            # mention the ancestry reference if the child belongs 
            #   to a different ancestry path
            child if ancestry[child] == ancestry[node] else ancestry[child][0]
            for child in children
        ]
    # keep track of the nodes that are to stay at the root level
    root = dict(nodes)
    # populate the "subgraphs" attributes
    for node, subgraphs in ancestry.items():
        for child, parent in zip((node, *subgraphs), subgraphs):
            nodes[parent]['subgraphs'][child] = nodes[child]
            root.pop(child, None)  # this child is not part of the root object

    return root

Run it as follows for your example input:

# demo run
adjacency = {
  1: [2, 3, 4],
  2: [5],
  3: [6, 9],
  4: [6],
  5: [],
  6: [7, 8],
  7: [],
  8: [],
  9: []
}

ancestry = {
  1: [],
  2: ['AA', 'A'],
  3: ['B'],
  4: [],
  5: ['AA', 'A'],
  6: ['BA', 'B'],
  7: ['BA', 'B'],
  8: ['BA', 'B'],
  9: ['BB', 'B']
}

result = transform(adjacency, ancestry)

The result will be:

{
    1: {
        'edges': ['AA', 'B', 4], 
        'subgraphs': {}
    }, 
    'A': {
        'edges': [], 
        'subgraphs': {
            'AA': {
                'edges': [], 
                'subgraphs': {
                    2: {
                        'edges': [5], 
                        'subgraphs': {}
                    }, 
                    5: {
                        'edges': [], 
                        'subgraphs': {}
                    }
                }
            }
        }
    }, 
    'B': {
        'edges': [], 
        'subgraphs': {
            3: {
                'edges': ['BA', 'BB'], 
                'subgraphs': {}
            }, 
            'BA': {
                'edges': [], 
                'subgraphs': {
                    6: {
                        'edges': [7, 8],
                        'subgraphs': {}
                    }, 
                    7: {
                        'edges': [], 
                        'subgraphs': {}
                    }, 
                    8: {
                        'edges': [],
                        'subgraphs': {}
                    }
                }
            },
            'BB': {
                'edges': [], 
                'subgraphs': {
                    9: {
                        'edges': [], 
                        'subgraphs': {}
                    }
                }
            }
        }
    },
    4: {
        'edges': ['BA'], 
        'subgraphs': {}
    }
}
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you, the approach makes a lot of sense. I was trying to approach this problem using DFS, but your approach of scanning through edges is more elegant. The only thing is that your logic of adding edges to the source node is not always correct, because sometimes the "representative" graph node can be a hierarchy node (but that's not your fault, because it was probably not formalized from the example in the question). Your answer helped me formalize that this would be based on the lowest common ancestor. I've proposed an edit to your answer and also updated my question with this example.
Instead of heavily editing the code I posted, I suggest you post an another answer yourself. As you said, this edit is based on something that was not specified in the question at the time I wrote it (I had tried to get more specs from you in earlier comments). If the edit would be minor, I would accept it, but now it wouldn't look as my code anymore, so I prefer to not let it go through (even though that will cost me your acceptance mark). I hope you understand.
no worries, I understand. Thanks for the answer nevertheless!
1

Basing this off of @trincot's answer (who wanted me to write my own answer instead of editing theirs as the change is non-trivial) with a critical change on how edges are made.

  • Create a node (a dict) for each distinct node that can be found in ancestry list, so including the ancestry groups. A node looks like this:

    { 'edges': [], 'subgraphs': {} }
    

    These objects are collected in a dict, keyed by their name (e.g. key could be 1 or 'AA', ...)

  • Populate the 'edges' attribute based on the adjacency input list. Make sure to use the ancestry reference instead when the edge transitions to another ancestry group. Here is where the LCA approach comes in.

  • Populate the 'subgraphs' attribute based on the ancestry input list. While doing that, keep track of the nodes that are not a child of another node: these should be retained for the root node in the data structure.

def transform(adjacency, ancestry):
    # utility function to get the element in the list before the target, if one is present
    def get_element_before(lst, target):
        try:
            index = lst.index(target)
            return lst[index - 1] if index - 1 >= 0 else None
        except ValueError:
            return None

    # Finds the lowest common ancestor between 2 paths, assuming that the paths
    # are ordered from bottom to top
    def find_lca(path1, path2):
        lca = None
        for a, b in zip(path1[::-1],  path2[::-1]):
            if a == b:
                lca = a
            else:
                break
        return lca

    # Given two nodes, it adds a link between the correct pair of "representative" nodes
    # in the newly constructed nested graph

    def add_link(node1, node2, nodes):
        ancestry1, ancestry2 = ancestry[node1], ancestry[node2]
        # Special cases when the LCA cannot be found
        if not ancestry1 and not ancestry2:
            nodes[node1]['edges'].append(node2)
        elif not ancestry1:
            nodes[node1]['edges'].append(ancestry2[-1])
        elif not ancestry2:
            nodes[ancestry1[-1]]['edges'].append(node2)
        else:
            # When LCA is likely to be present
            lca = find_lca(ancestry1, ancestry2)
            if not lca:
                # This can happen if the 2 nodes have completely disjoint hierarchy paths
                nodes[ancestry1[-1]]['edges'].append(ancestry2[-1])
                return
    
            # The node just below the LCA in each node serves as the "representative" node of that node in the newly built graph
            representative_node1 = get_element_before(ancestry1, lca)
            representative_node2 = get_element_before(ancestry2, lca)
            # If the two nodes are in the same subtree at the same level, they
            # will act as their own representative nodes
            representative_node1 = node1 if representative_node1 is None else representative_node1
            representative_node2 = node2 if representative_node2 is None else representative_node2
            nodes[representative_node1]['edges'].append(representative_node2)


    # Create the basic object (dict) for each node:
    nodes = { 
        subgraph: { 'edges': [], 'subgraphs': {} } 
            for node, subgraphs in ancestry.items()
            for subgraph in (node, *subgraphs)
    }
    # populate the "edges" attributes between basic nodes (or their "representative" nodes)
    for node, children in adjacency.items():
        for child in children:
            add_link(node, child, nodes)
    
    # keep track of the nodes that are to stay at the root level
    root = dict(nodes)
    # populate the "subgraphs" attributes
    for node, ancestors in ancestry.items():
        for child, parent in zip((node, *ancestors), ancestors):
            nodes[parent]['subgraphs'][child] = nodes[child]
            root.pop(child, None)

    return root

Testing it

adj_list = {
    1: [2, 3, 4],
    2: [5],
    3: [6, 9],
    4: [3],
    5: [3],
    6: [7, 8],
    7: [],
    8: [],
    9: []
}

ancestry = {
    1: [],
    2: ['AA', 'A'],
    3: ['B'],
    4: [],
    5: ['AA', 'A'],
    6: ['BA', 'B'],
    7: ['BA', 'B'],
    8: ['BA', 'B'],
    9: ['BB', 'B']
}

pprint.pprint(transform(adj_list, ancestry))

produces

{1: {'edges': ['A', 'B', 4], 'subgraphs': {}},
 4: {'edges': ['B'], 'subgraphs': {}},
 'A': {'edges': ['B'],
       'subgraphs': {'AA': {'edges': [],
                            'subgraphs': {2: {'edges': [5], 'subgraphs': {}},
                                          5: {'edges': [], 'subgraphs': {}}}}}},
 'B': {'edges': [],
       'subgraphs': {3: {'edges': ['BA', 'BB'], 'subgraphs': {}},
                     'BA': {'edges': [],
                            'subgraphs': {6: {'edges': [7, 8], 'subgraphs': {}},
                                          7: {'edges': [], 'subgraphs': {}},
                                          8: {'edges': [], 'subgraphs': {}}}},
                     'BB': {'edges': [],
                            'subgraphs': {9: {'edges': [], 'subgraphs': {}}}}}}}

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.