Unexpected output in for loop - Python

Question

I have this list:

t=[['universitario de deportes'],['lancaster'],['universitario de'],['juan aurich'],['muni'],['juan']]

I want to reorder the list according to the jaccard distance. If I reorder t the expected ouput should be:

[['universitario de deportes'],['universitario de'],['lancaster'],['juan aurich'],['juan'],['muni']]

The code of the jackard distance is working OK, but the rest of the code doesn't give the expected output.The code is below:

def jack(a,b):
    x=a.split()
    y=b.split()
    k=float(len(set(x)&set(y)))/float(len((set(x) | set(y))))
    return k
t=[['universitario de deportes'],['lancaster'],['universitario de'],['juan aurich'],['muni'],['juan']]

import copy as cp


b=cp.deepcopy(t)

c=[]

while (len(b)>0):
    c.append(b[0][0])
    d=b[0][0]
    del b[0]
    for m in range (0 , len(b)+1):
        if m > len(b):
            break
            if jack(d,b[m][0])>0.3:
                c.append(b[m][0])
                del b[m]

Unfortunately, the unexpected output is the same list :

print c
['universitario de deportes', 'lancaster', 'universitario de', 'juan aurich', 'muni', 'juan']

EDIT:

I tried to correct my code but it didn't work too but I got a little closer to the expected output:

t=[['universitario de deportes'],['lancaster'],['universitario de'],['juan aurich'],['muni'],['juan']]

import copy as cp


b=cp.deepcopy(t)

c=[]

while (len(b)>0):
    c.append(b[0][0])
    d=b[0][0]
    del b[0]
    for m in range(0,len(b)-1):
        if jack(d,b[m][0])>0.3:
            c.append(b[m][0])
            del b[m]

The "close" output is:

['universitario de deportes', 'universitario de', 'lancaster', 'juan aurich', 'muni', 'juan']

Second edit:

Finally, I came up with a solution that has quite fast computational. Currently, I'll use the code to order 60 thousands names. The code is below:

t=['universitario de deportes','lancaster','lancaste','juan aurich','lancaster','juan','universitario','juan franco']

import copy as cp


b=cp.deepcopy(t)

c=[]

while (len(b)>0):
    c.append(b[0])
    e=b[0]
    del b[0]
    for val in b:
        if jack(e,val)>0.3:
            c.append(val)
            b.remove(val)

print c
['universitario de deportes', 'universitario', 'lancaster', 'lancaster', 'lancaste', 'juan aurich', 'juan', 'juan franco'

Why does t contain single-item lists? Running jack on your values, only two entries have non-zero values, so the sorting won't do much. — jonrsharpe
– jonrsharpe, Commented Apr 6, 2014 at 16:55
According to t, there are two pairs with jaccard index larger than 0.3 and should be together in the output, but it doesn´t. — CreamStat
– CreamStat, Commented Apr 6, 2014 at 16:59
"I got a little closer to the expected output" is extremely unhelpful. Please provide inputs and expected and actual outputs. It would be useful if you tried to describe in words what the sorting algorithm should do, too. Also, review your variable names - they are currently pretty bad. — jonrsharpe
– jonrsharpe, Commented Apr 6, 2014 at 18:42
range(0,len(b)-1): should be range(len(b)) - range doesn't goes up to but doesn't include the stop parameter. Better yet, adopt the enumerate my answer suggests. — jonrsharpe
– jonrsharpe, Commented Apr 6, 2014 at 18:57

jonrsharpe · Accepted Answer · 2014-04-06 18:32:46Z

1

Firstly, not sure why you've got everything in single-item lists, so I suggest flattening it out first:

t = [l[0] for l in t]

This gets rid of the extra zero indices everywhere, and means you only need shallow copies (as strings are immutable).

Secondly, the last three lines of your code never run:

if m > len(b):
    break # nothing after this will happen
    if jack(d,b[m][0])>0.3:
       c.append(b[m][0])
       del b[m]

I think what you want is:

out = [] # this will be the sorted list
for index, val1 in enumerate(t): # work through each item in the original list
    if val1 not in out: # if we haven't already put this item in the new list
        out.append(val1) # put this item in the new list
    for val2 in t[index+1:]: # search the rest of the list
        if val2 not in out: # if we haven't already put this item in the new list
            jack(val1, val2) > 0.3: # and the new item is close to the current item
                out.append(val2) # add the new item too

This gives me

out == ['universitario de deportes', 'universitario de', 
      'lancaster', 'juan aurich', 'juan', 'muni']

I would generally recommend using better variable names than a, b, c, etc..

edited Apr 6, 2014 at 18:32

answered Apr 6, 2014 at 17:09

jonrsharpe

123k31 gold badges277 silver badges488 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

CreamStat Over a year ago

Your code doesn´t work for case : t=["cala","cala lima","uni","ali","uni le","ali po", "tr", "wq","tr uni"]

jonrsharpe Over a year ago

Edited - is that better? It would be helpful if you provided the answer you were expecting, rather that just "doesn't work".

CreamStat Over a year ago

["cala","cala lima","ali","ali po","uni","uni le","tr uni","tr","wq"]

CreamStat Over a year ago

Check mi edit, I got a little closer to the expected output, maybe you can correct me.

CreamStat Over a year ago

Your code is nice, it´s very close to the expected output but I don´t want duplicates so I choose the first time jaccard similarity is larger than 0.3

|

Collectives™ on Stack Overflow

Unexpected output in for loop - Python

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related