I have a text file that looks likes this
P1 : Some data
P2 : blabla
P4 : whatever
F1 : something
F2 : something else
G6 : This entry continues
G6 : down here
This is followed by a empty line and then a new record which looks the same as above (about 100k total). I need to get a text file in which every line contains the P2, p4 and G6 entry separated by a tab.
This is what I have so far
output = open('out.txt', 'w')
output.write("P1\tG6\n")
P1_ = False
G6_ = False
with open("data.txt", 'r') as data:
for line in data:
if line.startswith('P1 :'):
P1 = line[4:10]
P1_ = True
elif line.startswith('G6'):
G6 = line.lstrip('G6 :')
G6_ = True
else:
continue
if P1_ and G6_ :
output.write(year + "\t" + abstract)
year_ = False
abstract_ = False
output.close()
data.close()
The problem I encounter is that some records do not have all entries I need and some have the G6 spread over several lines. Any ideas on how to do this?
EDIT: After reading all of your answers I realised my question was a bit vague. I do need the records which do not have all entries.
dictwill be a good idea as a @Kursion said.