1

I am looking for a method in Python which can read multiple lines from a file(10 lines at a time). I have already looked into readlines(sizehint), I tried to pass value 10 but doesn't read only 10 lines. It actually reads till end of the file(I have tried on the small file). Each line is 11 bytes long and each read should fetch me 10 lines each time. If less than 10 lines are found then return only those lines. My actual file contains more than 150K lines.

Any idea how I can achieve this?

2
  • You can write the method yourself. Commented Oct 8, 2012 at 23:46
  • 1
    readlines reads the whole file. The optional argument is not a limit to how many lines. Commented Oct 8, 2012 at 23:48

4 Answers 4

9

You're looking for itertools.islice():

with open('data.txt') as f:
    lines = []
    while True:
        line = list(islice(f, 10)) #islice returns an iterator ,so you convert it to list here.
        if line:                     
            #do something with current set of <=10 lines here
            lines.append(line)       # may be store it 
        else:
            break
    print lines    
Sign up to request clarification or add additional context in comments.

5 Comments

The problem with using islice is that the isliceobject is read-once only. So if the OP wants to check previous lines based on conditions on the current line, he would have to cache it manually. Might be better to use a list (depending on the use case)
@inspectorG4dget: But the same problem exists for your answer as well. You are reading 10 lines and returning them. It is still up to the caller to cache previous batches of lines.
@jdi: not what I meant. If I read lines 1-5 and on line 6, wanted to look at line 3, I can't do that with islice without another cache. I was referring to lines in the same batch, not lines in previous batches
He is converting it to a list right here in the code. All it requires is assigning it to a variable. I think the print is just for simplicity sake.... lines = list(lines)
I only see one edit. It originally was being converted to a list, just printed..not saved. Anyways. +1 good answer.
3

This should do it

def read10Lines(fp):
    answer = []
    for i in range(10):
        answer.append(fp.readline())
    return answer

Or, the list comprehension:

ten_lines = [fp.readline() for _ in range(10)]

In both cases, fp = open('path/to/file')

Comments

1

Another solution which can get rid of the silly infinite loop in favor of a more familiar for loop relies on itertools.izip_longest and a small trick with iterators. The trick is that zip(*[iter(iterator)]*n) breaks iterator up into chunks of size n. Since a file is already generator-like iterator (as opposed to being sequence like), we can write:

from itertools import izip_longest
with open('data.txt') as f:
    for ten_lines in izip_longest(*[f]*10,fillvalue=None):
        if ten_lines[-1] is None:
           ten_lines = filter(ten_lines) #filter removes the `None` values at the end
        process(ten_lines) 

Comments

0
from itertools import groupby, count
with open("data.txt") as f:
    groups = groupby(f, key=lambda x,c=count():next(c)//10)
    for k, v in groups:
        bunch_of_lines = list(v)
        print bunch_of_lines

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.