1

I am looking for a smart algorithm or pythonic approach to create clusters from pairs of data.

The input data is structured like this:

[
 (productA,ProductB),
 (productB,ProductC),
 (productC,ProductD),
 (productA,ProductD),
 (productD,ProductB),
 (productC,ProductA),

 (productE,ProductF),
 (productF,ProductG),
 (productG,ProductH),
 (productG,ProductE),
]

and it should be clustered to:

[
(productA,productB,productC,productD),
(productE,productF,productG,productH)
]

How can this be achieved? (The order of the products within the two clusters does not matter)

Any ideas are greatly appreciated!

1
  • A cluster is the transitive closure over all products linked via some (directional) pair? Is the relation symmetric? Commented Jan 26, 2016 at 19:10

2 Answers 2

3

Using networkx, you could build a graph and find the connected components:

import networkx as nx

data = [
 ('productA','productB'),
 ('productB','productC'),
 ('productC','productD'),
 ('productA','productD'),
 ('productD','productB'),
 ('productC','productA'),
 ('productE','productF'),
 ('productF','productG'),
 ('productG','productH'),
 ('productG','productE'),
]

G = nx.Graph()
G.add_edges_from(data)
for connected_component in nx.connected_components(G):
    print(connected_component)

yields

['productG', 'productF', 'productE', 'productH']
['productD', 'productC', 'productB', 'productA']
Sign up to request clarification or add additional context in comments.

1 Comment

will try this instantly. Two things to note: 1. I love SO... always a solution to any problem, 2. I love python...
0

What you are looking for is: Quick Union algorithm.

1 Comment

QU requires the relation to be transitive, reflexive and symmetric. The examples don't show if the relation is symmetric.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.