0

Keeping it simple, [omitting scale and parallelism], I'm trying to read a text file. On that text file, there are entries which run over more than one line (other software has character entry limits). An example is below

#Iterating through the file
with open(fileName, 'r') as file:
     #Examining each line
     for line in file:
         #If the first three characters meet a condition
         if line[:3] == "aa ":
             #If the last character is not a condition
             if line.rstrip()[-1:] != "'":
                   #Then this entry effectively runs onto *at least* the next line
                   #Store the current line in a buffer for reuse
                   temp = line

                   #Here is my issue, I don't want to use a 'for line in file' again, as that would require me to write multiple "for" & "if" loops to consider the possibility of entries running over several lines
                   [Pseudocode]
                   while line.rstrip()[-1:] in file != "'":
                           #Concatenate the entries to date
                           temp = temp + line

                   #entry has completed
                   list.append(temp)

              else
                   #Is a single line entry
                   list.append(line)

But, its obviously not liking the while loop. I've had a look around and not come across anything. Anyone any ideas? Thanks.

3
  • please post a snippet of the text file? Commented Oct 4, 2017 at 19:49
  • This process would be a little simpler if you can read the whole file into RAM as a list of lines. Or is it too big to do that? But anyway, inside your main loop you can get the next line by doing line = next(file). Commented Oct 4, 2017 at 19:57
  • Yeah, the next() command is useful, but don't believe you can iterate with it over i "next" lines if you know what I mean. Some of the files would be too big, my basic concept is to break them up and fire off multiprocessing, but want to make sure I don't lose multi-line entries when doing so. edit: Ah, maybe it will - three of you now have suggested it so I guess it could do the trick. Thanks! Commented Oct 4, 2017 at 20:07

3 Answers 3

2

This should work. I constructed my own sample input:

# Content of input.txt:
# This is a regular entry.
# aa 'This is an entry that
# continues on the next line
# and the one after that.'
# This is another regular entry.

entries = []
partial_entry = None  # We use this when we find an entry spanning multiple lines

with open('input.txt', 'r') as file:
    for line in file:
        # If this is a continuation of a previous entry
        if partial_entry is not None:
            partial_entry += line

            # If the entry is now complete
            if partial_entry.rstrip()[-1] == "'":
                entries.append(partial_entry)
                partial_entry = None
        else:
            # If this is an entry that will continue
            if line.startswith("aa ") and line.rstrip()[-1] != "'":
                partial_entry = line
            else:
                entries.append(line)

# If partial_entry is non-None here, we have some entry that never terminated
assert partial_entry is None

print(entries)

# Output:
# ['This is a regular entry.\n', "aa 'This is an entry that\ncontinues on the next line\nand the one after that.'\n", 'This is another regular entry.\n']

EDIT

Based on PM2Ring's suggestion above, here's a solution using next(file). (Same input and output as before.)

entries = []

with open('input.txt', 'r') as file:
    for line in file:
        if line.startswith("aa "):
            while not line.rstrip().endswith("'"):
                line += next(file)
        entries.append(line)

print(entries)
Sign up to request clarification or add additional context in comments.

4 Comments

Per the comments on the other solution, I think perhaps lines that don't start with "aa " are to be ignored. If so, the second solution should have the entries.append call indented, and the first solution requires even more changes.
Just to let you all know, going with += next() also includes the \n string. This was removed after exiting the while loop using line = line.replace("\n","")
If you don't want the newlines, just rstrip() each line as you go.
line = line.rstrip() at the top of the loop and line += next(file).rstrip(). Then you can drop the rstrip in the while condition too.
1

Use next() on a iterator to get only the next element, without disturbing the for loop:

#Iterating through the file
with open(fileName, 'r') as file:
     #Examining each line
     for line in file:
         #If the first three characters meet a condition
         if line[:3] == "aa ":
             while not line.rstrip().endswith("'"):
                 line += next(file)

             #entry has completed
             list.append(line)

4 Comments

Ah, looks like we came to this solution at the same time. :-) Minor issue: that list.append(line) shouldn't be indented inside the if.
indeed :) but based on code from OP, I'd say the indent is good
Oh, sorry, I guess you're right. Perhaps lines that don't start with "aa " are to be ignored?
Just to let you all know, going with += next() also includes the \n string. This was removed after exiting the while loop using line = line.replace("\n","") (in case someone else finds this in a search on down the line)
1

When you read a line that is continued onto the next line, just stash the partial result in a variable and let the loop go to the next line and concatenate the lines. For example:

#Iterating through the file
result = []
with open(filename, 'r') as file:
     buffer = ''
     #Examining each line
     for line in file:
         #If the first three characters meet a condition
         if line[:3] == "aa ":
             buffer += line
             #If the last character indicates that the line is NOT to be continued, 
             if line.rstrip()[-1:] == "'":
                 result.append(buffer)
                 buffer = ''
     if buffer:
         # Might want to warn the the last line expected continuation but no subsequent line was found
         result.append(buffer)
print result

Note that it might be better if the file is very large to use the yield statement to produce the lines of the result rather than accumulating them in a list.

1 Comment

Thanks for the answer, I ended up going with +=next() as it required far less rework.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.