0

I have a very large file sorted on a field. I'd like to read this data and group lines together than contain the same value in the field. For example:

I have a file with two fields:

12    fish
50    fish
1     turtle
11    dog
34    dog
12    dog

I'm looking for a solution that uses an iterator or a generator. It's not possible for me to read all the data into memory, only one group (inner list) as a time. I was trying to use groupby, but couldn't figure out how to group based on the same value in a field.

How can I product lists like this:

[[12, fish], [50, fish]]
[[1, turtle]]
[[11, dog], [34, dog] [12, dog]]

1 Answer 1

6
from itertools import groupby
from operator import itemgetter

with open('somefile') as fin:
    lines = (line.split() for line in fin)
    for key, items in groupby(lines, itemgetter(1)):
        print list(items)

[['12', 'fish'], ['50', 'fish']]
[['1', 'turtle']]
[['11', 'dog'], ['34', 'dog'], ['12', 'dog']]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.