Processing a list of lists in Python

Question

So I have a biiiiig list of lists, looks like:

big_list = [[17465, [22, 33, 1, 7, 83, 54, 84, -5], '123-432-3'], [13254, [42, 64, 4, -5, 75, -2, 1, 6], '1423-1762-4'], [...........................................................................................................], [17264, [22, 75, 54, 2, 87, 12, 23, 86], '14234-453-1']]

I need to cycle over the entire list of lists and when it detects two or more strings (element [2] of each inner lists e.g. '123-423-3') that are the same it will amalgamate the lists of ints (element[1]) relating to that string with the list of ints relating to the last same value string detected.

Can you clarify what you want the resulting list to look like? — Burhan Khalid
– Burhan Khalid, Commented Jul 31, 2012 at 10:28
Is each list comprised of three elements? [int, list, string]? Is it important to preserve any kind of ordering? — phant0m
– phant0m, Commented Jul 31, 2012 at 10:30
relating to that string with the list of ints relating to the last same value string detected ??? I don't understand — jamylak
– jamylak, Commented Jul 31, 2012 at 10:45

jamylak · Accepted Answer · 2012-07-31 13:05:39Z

1

This is my solution if you are looking for the string matches anywhere in big_list:

>>> from collections import OrderedDict
>>> big_list = [[17465, [1, 2, 3], '123-432-3'], [13254, [4, 5, 6], '1423-1762-4'], [17264, [7, 8, 9], '14234-453-1'], [12354, [10, 11, 12], '14234-453-1'], [12358, [13, 14], '14234-453-1'], [99213, [15], '123-999-3'], [27461, [16, 17, 18], '123-432-3']]
>>> def amalgamate(seq):
        d = OrderedDict()
        for num, ints, text in big_list:
            d.setdefault(text, [num, [], text])[1].extend(ints)
        return d.values()

>>> amalgamate(big_list)
[[17465, [1, 2, 3, 16, 17, 18], '123-432-3'], [13254, [4, 5, 6], '1423-1762-4'], [17264, [7, 8, 9, 10, 11, 12, 13, 14], '14234-453-1'], [99213, [15], '123-999-3']]

edited Jul 31, 2012 at 13:05

answered Jul 31, 2012 at 11:12

jamylak

134k30 gold badges238 silver badges240 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

user1532369 Over a year ago

only looking for string matches, then amalgamating the list of ints relating to each string match so that there will be just one string value with a very large related list of ints.

user1532369 Over a year ago

this looks like it does exactly what I want it to, thak you very much!

jamylak Over a year ago

@User1532369 Ah yes that's what I meant, so you should probably accept this answer since my other one finds only consecutive matches

user1532369 Over a year ago

I am getting the error: cannot import OrderedDict. Is this not supported by Python2.5?

jamylak Over a year ago

@user1532369 yes it's new in 2.7, however you could find a recipe which gives you the same thing in 2.5 by searching 'python 2.5 ordereddict recipe'

|

Inbar Rose · Accepted Answer · 2012-07-31 11:04:12Z

1

i think this solves your problem.

big_list = [[17465, [22, 33, 1, 7, 83, 54, 84, -5], '123-432-3'], \
            [13254, [42, 64, 4, -5, 75, -2, 1, 6], '1423-1762-4'], \
            [17264, [22, 75, 54, 2, 87, 12, 23, 86], '14234-453-1']]


# adding same string element to big_list
big_list.append([22222, [10, 12, 13], '14234-453-1'])
#now should itterate big_list, and when '14234-453-1' is found in 2 inner lists.
#it will put the values [10, 12, 13] into the first instance and remove the second.

print "Before:"
for l in big_list:
      print l

seen_list = {}
del_list = []
for inner in xrange(len(big_list)):
      if big_list[inner][2] in seen_list:
            for item in big_list[inner][1]:
                  big_list[seen_list[big_list[inner][2]]][1].append(item)
            del_list.append(inner)
      else:
            seen_list[big_list[inner][2]] = inner

for i in reversed(del_list):
      del big_list[i]

print "after:"

for l in big_list:
      print l

result:

>>> 
Before:
[17465, [22, 33, 1, 7, 83, 54, 84, -5], '123-432-3']
[13254, [42, 64, 4, -5, 75, -2, 1, 6], '1423-1762-4']
[17264, [22, 75, 54, 2, 87, 12, 23, 86], '14234-453-1']
[22222, [10, 12, 13], '14234-453-1']
after:
[17465, [22, 33, 1, 7, 83, 54, 84, -5], '123-432-3']
[13254, [42, 64, 4, -5, 75, -2, 1, 6], '1423-1762-4']
[17264, [22, 75, 54, 2, 87, 12, 23, 86, 10, 12, 13], '14234-453-1']

edited Jul 31, 2012 at 11:04

answered Jul 31, 2012 at 10:53

Inbar Rose

43.7k24 gold badges91 silver badges137 bronze badges

3 Comments

user1532369 Over a year ago

This is EXACTLY what I needed, thank you very much!! However when I run it on mine absolutely nothing happens, hopefully I will figure out why. Will this work if the string appears more than twice?

Inbar Rose Over a year ago

from what i understand. your "big_list" contains smaller lists of this structure: [INT, LIST[INT,INT,INT,...INT], STR] what my code does is go through these smaller lists, and make a dictionary of each STR, each time it is found to already be in the big_list, then that whole inner list is marked for deletion, and the inner lists list of INT is added to the list of INT from the first time the str is found. i was actually wondering about the first INT in each inner list, the duplicates ones are lost?

user1532369 Over a year ago

The duplicates do not matter for the int, it is essentially a verification number for the string so I only need it once if that makes any sense to you?

jamylak · Accepted Answer · 2012-07-31 11:03:10Z

In it's current form the question is unclear but this might be what you are looking for (this works for consecutive matches):

>>> from itertools import groupby
>>> from operator import itemgetter
>>> big_list = [[17465, [1, 2, 3], '123-432-3'], [13254, [4, 5, 6], '1423-1762-4'], [17264, [7, 8, 9], '14234-453-1'], [12354, [10, 11, 12], '14234-453-1'], [12358, [13, 14], '14234-453-1'], [99213, [1], '123-999-3']]
>>> def amalgamate(seq):
        for k, g in groupby(seq, itemgetter(2)):
            num, ints, text = next(g)
            for sublist in g:
                ints.extend(sublist[1])
            yield [num, ints, text]


>>> list(amalgamate(big_list))
[[17465, [1, 2, 3], '123-432-3'], [13254, [4, 5, 6], '1423-1762-4'], [17264, [7, 8, 9, 10, 11, 12, 13, 14], '14234-453-1'], [99213, [1], '123-999-3']]

Collectives™ on Stack Overflow

Processing a list of lists in Python

3 Answers 3

9 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

9 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related