Python - Groups lists by one element

Question

I'm trying to process data from a file input. A line contains 3 values separated by a whitespace. I'm trying to add them to a list to be grouped by the second value. So fe. I have the input:

qwe rty 12
asd fgh 34
zxc rty 96

and I want it to be stored in a variable like this:

variable =
[[[qwe, rty, 12], [zxc, rty, 96]],
[[asd, fgh, 34]]]

This is so that I can access it like this:

variable[0] #this should be [[qwe, rty, 12], [zxc rty, 96]]
variable[1] #this should be[[asd, fgh, 34]]

I'm trying

f = open('input_1.txt')
values = [] #to keep track of which values have occured before
data = []
for line in f:
    ldata = lineprocess(line) #this transforms the raw data to [qwe, rty, 12] etc.
    if ldata[1] in values:
        data[values.index(ldata[1])].append(ldata)
    elif ldata[1] not in values :
        values.append(ldata[1])
        data.append(ldata)

This, however, returns a list like this:

[['qwe', 'rty', 12, ['zxc', 'rty', 96]],
 ['asd', 'fgh', 34]]

What should I do to get

[[['qwe', 'rty', 12], ['zxc', 'rty', 96]],
 [['asd', 'fgh', 34]]]

instead?

That is a very strange data structure. What exactly are you trying to build? Have you considered using a dictionary instead to make it easier to reference this data? — idjaw
– idjaw, Commented Nov 2, 2016 at 22:38
The main thing I need is to group the data. So from the input file I have a list of 3 values, let's call this a triple. I want to group it so that I have the triples with the same middle value under the same index, so then I can just say to another process: Okay, you'll have to process all the triples under the index 0, and the next process should use all the triples under the index 1 etc. — lte__
– lte__, Commented Nov 2, 2016 at 22:43

Eli Korvigo · Accepted Answer · 2016-11-02 22:57:13Z

2

If you don't want dictionaries, you can use groupby

from itertools import groupby
from operator import itemgetter

with open(...) as lines: 
    parsed_lines = map(lineprocess, lines) # I'm using your `lineprocess`
    second_item = itemgetter(1)
    groups = groupby(sorted(parsed_lines, key=second_item), second_item)
    result = [list(group) for predicate, group in groups]

This has O(nlogn) average case performance, which is better than your O(n^2). Still, a dictionary-based solution would be O(n).

edited Nov 2, 2016 at 22:57

answered Nov 2, 2016 at 22:51

Eli Korvigo

10.5k6 gold badges50 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

TheoretiCAL · Accepted Answer · 2016-11-02 22:57:07Z

0

data should contain a list of lists, not just lists.

f = open('input_1.txt')
values = [] #to keep track of which values have occured before
data = []
for line in f:
    ldata = lineprocess(line) #this transforms the raw data to [qwe, rty, 12] etc.
    if ldata[1] in values:
        data[values.index(ldata[1])].append(ldata)
    else:
        values.append(ldata[1])
        data.append([ldata])

Consider:

a = [1,2,3]
b = [4,5,6]
a.append(b)
print a # [1, 2, 3, [4, 5, 6]]
c = [[1,2,3]]
c.append(b)
print c # [[1, 2, 3], [4, 5, 6]]

edited Nov 2, 2016 at 22:57

answered Nov 2, 2016 at 22:41

TheoretiCAL

20.7k8 gold badges45 silver badges67 bronze badges

1 Comment

lte__ Over a year ago

Thanks! The problem was not with the values variable, but with data, but I see your point! :) I needed data[games.index(ldata[1])] = ([ldata]) ;)

Collectives™ on Stack Overflow

Python - Groups lists by one element

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related