2

I have a huge text file (1 GB), where each "line" is separated by ##.
For example:

## sentence 1 ## sentence 2
## sentence 3

I'm trying to print the file according to the ## separation.

I tried the following code, but the read() function crush (because the size of the file).

import re

dataFile = open('post.txt', 'r')
p = re.compile('##(.+)')

iterator = p.finditer(dataFile.read())
for match in iterator:
    print (match.group())

dataFile.close()

Any ideas?

1
  • Post the expected output and a small sample input. Commented Aug 12, 2013 at 0:34

2 Answers 2

4

This will read the file in chunks (of chunksize bytes) thus avoiding memory issues related to reading too much of the file all at once:

import re
def open_delimited(filename, delimiter, *args, **kwargs):
    """
    http://stackoverflow.com/a/17508761/190597
    """
    with open(filename, *args, **kwargs) as infile:
        chunksize = 10000
        remainder = ''
        for chunk in iter(lambda: infile.read(chunksize), ''):
            pieces = re.split(delimiter, remainder + chunk)
            for piece in pieces[:-1]:
                yield piece
            remainder = pieces[-1]
        if remainder:
            yield remainder

filename = 'post.txt'
for chunk in open_delimited(filename, '##', 'r'):
    print(chunk)
    print('-'*80)
Sign up to request clarification or add additional context in comments.

2 Comments

a bit overkill since I don't think his regex ever spans line boundaries, but still a useful tool to have in one's kit.
Reading large files line-by-line is too slow. You'll do better by processing the file in chunks.
1

You can use islice.

from itertools import islice

file = open('file.txt', 'r')
while True:
  slice = islice(file, buffer)
  to_process = []
  for line in slice:
    to_process.append(line)
  if not to_process:
    break
  #process to_process list
file.close()

buffer is the number of lines you want to read at a time (you have to define the int).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.