Using a while loop for conditional read of text file

Question

Keeping it simple, [omitting scale and parallelism], I'm trying to read a text file. On that text file, there are entries which run over more than one line (other software has character entry limits). An example is below

#Iterating through the file
with open(fileName, 'r') as file:
     #Examining each line
     for line in file:
         #If the first three characters meet a condition
         if line[:3] == "aa ":
             #If the last character is not a condition
             if line.rstrip()[-1:] != "'":
                   #Then this entry effectively runs onto *at least* the next line
                   #Store the current line in a buffer for reuse
                   temp = line

                   #Here is my issue, I don't want to use a 'for line in file' again, as that would require me to write multiple "for" & "if" loops to consider the possibility of entries running over several lines
                   [Pseudocode]
                   while line.rstrip()[-1:] in file != "'":
                           #Concatenate the entries to date
                           temp = temp + line

                   #entry has completed
                   list.append(temp)

              else
                   #Is a single line entry
                   list.append(line)

But, its obviously not liking the while loop. I've had a look around and not come across anything. Anyone any ideas? Thanks.

This process would be a little simpler if you can read the whole file into RAM as a list of lines. Or is it too big to do that? But anyway, inside your main loop you can get the next line by doing line = next(file). — PM 2Ring
– PM 2Ring, Commented Oct 4, 2017 at 19:57
Yeah, the next() command is useful, but don't believe you can iterate with it over i "next" lines if you know what I mean. Some of the files would be too big, my basic concept is to break them up and fire off multiprocessing, but want to make sure I don't lose multi-line entries when doing so. edit: Ah, maybe it will - three of you now have suggested it so I guess it could do the trick. Thanks! — Amiga500
– Amiga500, Commented Oct 4, 2017 at 20:07

user94559 · Accepted Answer · 2017-10-04 20:02:09Z

2

This should work. I constructed my own sample input:

# Content of input.txt:
# This is a regular entry.
# aa 'This is an entry that
# continues on the next line
# and the one after that.'
# This is another regular entry.

entries = []
partial_entry = None  # We use this when we find an entry spanning multiple lines

with open('input.txt', 'r') as file:
    for line in file:
        # If this is a continuation of a previous entry
        if partial_entry is not None:
            partial_entry += line

            # If the entry is now complete
            if partial_entry.rstrip()[-1] == "'":
                entries.append(partial_entry)
                partial_entry = None
        else:
            # If this is an entry that will continue
            if line.startswith("aa ") and line.rstrip()[-1] != "'":
                partial_entry = line
            else:
                entries.append(line)

# If partial_entry is non-None here, we have some entry that never terminated
assert partial_entry is None

print(entries)

# Output:
# ['This is a regular entry.\n', "aa 'This is an entry that\ncontinues on the next line\nand the one after that.'\n", 'This is another regular entry.\n']

EDIT

Based on PM2Ring's suggestion above, here's a solution using next(file). (Same input and output as before.)

entries = []

with open('input.txt', 'r') as file:
    for line in file:
        if line.startswith("aa "):
            while not line.rstrip().endswith("'"):
                line += next(file)
        entries.append(line)

print(entries)

edited Oct 4, 2017 at 20:02

answered Oct 4, 2017 at 19:55

user94559

60.3k6 gold badges108 silver badges107 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user94559 Over a year ago

Per the comments on the other solution, I think perhaps lines that don't start with "aa " are to be ignored. If so, the second solution should have the entries.append call indented, and the first solution requires even more changes.

Amiga500 Over a year ago

Just to let you all know, going with += next() also includes the \n string. This was removed after exiting the while loop using line = line.replace("\n","")

user94559 Over a year ago

If you don't want the newlines, just rstrip() each line as you go.

user94559 Over a year ago

line = line.rstrip() at the top of the loop and line += next(file).rstrip(). Then you can drop the rstrip in the while condition too.

Guillaume · Accepted Answer · 2017-10-04 20:07:19Z

1

Use next() on a iterator to get only the next element, without disturbing the for loop:

#Iterating through the file
with open(fileName, 'r') as file:
     #Examining each line
     for line in file:
         #If the first three characters meet a condition
         if line[:3] == "aa ":
             while not line.rstrip().endswith("'"):
                 line += next(file)

             #entry has completed
             list.append(line)

edited Oct 4, 2017 at 20:07

answered Oct 4, 2017 at 20:03

Guillaume

6,1293 gold badges28 silver badges48 bronze badges

4 Comments

user94559 Over a year ago

Ah, looks like we came to this solution at the same time. :-) Minor issue: that list.append(line) shouldn't be indented inside the if.

Guillaume Over a year ago

indeed :) but based on code from OP, I'd say the indent is good

user94559 Over a year ago

Oh, sorry, I guess you're right. Perhaps lines that don't start with "aa " are to be ignored?

Amiga500 Over a year ago

Just to let you all know, going with += next() also includes the \n string. This was removed after exiting the while loop using line = line.replace("\n","") (in case someone else finds this in a search on down the line)

J. Beattie · Accepted Answer · 2017-10-04 20:11:34Z

1

When you read a line that is continued onto the next line, just stash the partial result in a variable and let the loop go to the next line and concatenate the lines. For example:

#Iterating through the file
result = []
with open(filename, 'r') as file:
     buffer = ''
     #Examining each line
     for line in file:
         #If the first three characters meet a condition
         if line[:3] == "aa ":
             buffer += line
             #If the last character indicates that the line is NOT to be continued, 
             if line.rstrip()[-1:] == "'":
                 result.append(buffer)
                 buffer = ''
     if buffer:
         # Might want to warn the the last line expected continuation but no subsequent line was found
         result.append(buffer)
print result

Note that it might be better if the file is very large to use the yield statement to produce the lines of the result rather than accumulating them in a list.

answered Oct 4, 2017 at 20:11

J. Beattie

1831 silver badge7 bronze badges

1 Comment

Amiga500 Over a year ago

Thanks for the answer, I ended up going with +=next() as it required far less rework.

Collectives™ on Stack Overflow

Using a while loop for conditional read of text file

3 Answers 3

4 Comments

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related