2

I have the following list of lists that contains 5 entries:

my_lol = [['a', 1.01], ['x',1.00],['k',1.02],['p',3.00], ['b', 3.09]]

I'd like to 'cluster' the above list following roughly this:

1. Sort `my_lol` with respect to the value in the list ascending
2. Pick the lowest entry in `my_lol` as the key of first cluster
3. Calculate the value difference of the current entry with the previous one
4. If the difference is less than the threshold, include that as the member cluster of the first
entry, otherwise assign the current key as the key of the next cluster. 
5. Repeat the rest until finish

At the end of the day I'd like to get the following dictionary of lists:

dol = {'x':['x','a','k'], 'p':['p','b']}

Essentially that dictionary of lists is a cluster that contains two clusters.

I tried this but got stuck from step 3. What's the right way to do it?

import operator
import json
from collections import defaultdict

my_lol = [['a', 1.01], ['x',1.00],['k',1.02],['p',3.00], ['b', 3.09]]
my_lol_sorted = sorted(my_lol, key=operator.itemgetter(1))

thres = 0.1
tmp_val = 0
tmp_ids = "-"

dol = defaultdict(list)
for ids, val in my_lol_sorted:
    if tmp_ids != "-":
        diff = abs(tmp_val - val)

        if diff < thres:
            print tmp_ids
            dol[tmp_ids].append(tmp_ids)

    tmp_ids = ids
    tmp_val = val

print json.dumps(dol, indent=4)

2 Answers 2

1
import operator
import json
from collections import defaultdict

my_lol = [['a', 1.01], ['x',1.00],['k',1.02],['p',3.00], ['b', 3.09]]
my_lol_sorted = sorted(my_lol, key=operator.itemgetter(1))

thres = 0.1
tmp_val = 0
tmp_ids = "-"

dol = defaultdict(list)
for ids, val in my_lol_sorted:
    if tmp_ids == "-":
        tmp_ids = ids
    else:
        diff = abs(tmp_val - val)
        if diff > thres:
            tmp_ids = ids
    dol[tmp_ids].append(ids)
    tmp_val = val

print json.dumps(dol, indent=4)
Sign up to request clarification or add additional context in comments.

Comments

1

Try this:

dol = defaultdict(list)
if len(my_lol) > 0:
    thres = 0.1
    tmp_ids, tmp_val = my_lol_sorted[0]

    for ids, val in my_lol_sorted:
        diff = abs(tmp_val - val)
        if diff > thres:
            tmp_ids = ids
        dol[tmp_ids].append(ids)
        tmp_val = val

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.