0

I'm trying to process data from a file input. A line contains 3 values separated by a whitespace. I'm trying to add them to a list to be grouped by the second value. So fe. I have the input:

qwe rty 12
asd fgh 34
zxc rty 96

and I want it to be stored in a variable like this:

variable =
[[[qwe, rty, 12], [zxc, rty, 96]],
[[asd, fgh, 34]]]

This is so that I can access it like this:

variable[0] #this should be [[qwe, rty, 12], [zxc rty, 96]]
variable[1] #this should be[[asd, fgh, 34]]

I'm trying

f = open('input_1.txt')
values = [] #to keep track of which values have occured before
data = []
for line in f:
    ldata = lineprocess(line) #this transforms the raw data to [qwe, rty, 12] etc.
    if ldata[1] in values:
        data[values.index(ldata[1])].append(ldata)
    elif ldata[1] not in values :
        values.append(ldata[1])
        data.append(ldata)

This, however, returns a list like this:

[['qwe', 'rty', 12, ['zxc', 'rty', 96]],
 ['asd', 'fgh', 34]]

What should I do to get

[[['qwe', 'rty', 12], ['zxc', 'rty', 96]],
 [['asd', 'fgh', 34]]] 

instead?

2
  • That is a very strange data structure. What exactly are you trying to build? Have you considered using a dictionary instead to make it easier to reference this data? Commented Nov 2, 2016 at 22:38
  • The main thing I need is to group the data. So from the input file I have a list of 3 values, let's call this a triple. I want to group it so that I have the triples with the same middle value under the same index, so then I can just say to another process: Okay, you'll have to process all the triples under the index 0, and the next process should use all the triples under the index 1 etc. Commented Nov 2, 2016 at 22:43

2 Answers 2

2

If you don't want dictionaries, you can use groupby

from itertools import groupby
from operator import itemgetter

with open(...) as lines: 
    parsed_lines = map(lineprocess, lines) # I'm using your `lineprocess`
    second_item = itemgetter(1)
    groups = groupby(sorted(parsed_lines, key=second_item), second_item)
    result = [list(group) for predicate, group in groups]

This has O(nlogn) average case performance, which is better than your O(n^2). Still, a dictionary-based solution would be O(n).

Sign up to request clarification or add additional context in comments.

Comments

0

data should contain a list of lists, not just lists.

f = open('input_1.txt')
values = [] #to keep track of which values have occured before
data = []
for line in f:
    ldata = lineprocess(line) #this transforms the raw data to [qwe, rty, 12] etc.
    if ldata[1] in values:
        data[values.index(ldata[1])].append(ldata)
    else:
        values.append(ldata[1])
        data.append([ldata])

Consider:

a = [1,2,3]
b = [4,5,6]
a.append(b)
print a # [1, 2, 3, [4, 5, 6]]
c = [[1,2,3]]
c.append(b)
print c # [[1, 2, 3], [4, 5, 6]]

1 Comment

Thanks! The problem was not with the values variable, but with data, but I see your point! :) I needed data[games.index(ldata[1])] = ([ldata]) ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.