I have a text file in the following format:
1. AUTHOR1
(blank line, with a carriage return)
Citation1
2. AUTHOR2
(blank line, with a carriage return)
Citation2
(...)
That is, in this file, some lines begin with an integer number, followed by a dot, a space, and text indicating an author's name; these lines are followed by a blank line (which includes a carriage return), and then for a line of text beginning with an alphabetic character (an article or book citation).
What I want is to read this file into a Python list, joining the author's names and citation, so that each list element is of the form:
['AUTHOR1 Citation1', 'AUTHOR2 Citation2', '...']
It looks like a simple programming problem, but I could not figure out a solution to it. What I attempted was as follows:
articles = []
with open("sample.txt", "rb") as infile:
while True:
text = infile.readline()
if not text: break
authors = ""
citation = ""
if text == '\n': continue
if text[0].isdigit():
authors = text.strip('\n')
else:
citation = text.strip('\n'
articles.append(authors+' '+citation)
but the articles list gets authors and citations stored as separate elements!
Thanks in advance for any help in solving this vexing problem... :-(