1

Input: large file around 12GB with sequence file, with ~ delimiter and I want to break after every 10th occurrence with new line.

I tried with

with open ("file.txt") as f:
    for line in f:
        x = line.count("~")
        y = line.split("~")
        s = ['Ç'.join(x) for x in [y[i:i + 10] for i in xrange(0, len(y), 10)]]
with open ("output.txt","w") as outfile:
    outfile.write("~\n".join(s))

While line.split('~') I'am getting memory error.

I tried with y = [line.split('~') for line in f] but no use same error. Please assist me how to handle this issue.

1 Answer 1

1
for line in f:

will try to load all file into your RAM

Use xreadline iterator to load file line by line:

for line in f.xreadlines():
Sign up to request clarification or add additional context in comments.

6 Comments

after changing to for line in f.xreadlines(): still am facing memory error for this line y = line.split("~").
Check the size of a line. If you don't have proper end of line symbols in your file then all file will be read into one line. In that case you have to read your file as binary using a buffer. See example in following question: stackoverflow.com/questions/1035340/…
i tried with binary as well but no use again same memory error
outfile.write("~\n".join(s)) will also load everything into the memory try to use: for item in s: outfile.write(item+'\n')
diveintopython.net/file_handling/file_objects.html could be useful for understanding of file opertions
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.