I am trying to findall instances of the string "PB" and the digits that follow it, but when I call.
number_all = re.findall(r'\bPB\b([0-9])\d+', ' '.join(number_list))
the ([0-9])\d+ doesn't return an output. I check my output file, sequence.txt but there is nothing inside it. If i just do \bPB\b it outputs "PB" but no numbers.
My input file, raw-sequence.txt looks like this:
WB (19, 21, 24, 46, 60)
WB (12, 11, 9, 23, 49)
PB (18, 21, 10, 5, 5)
WB (2, 14, 2, 29, 67)
WB (1, 8, 1, 16, 52)
PB (2, 11, 8, 3, 4)
How can I output the following lines to sequence.txt?
PB (18, 21, 10, 5, 5)
PB (2, 11, 8, 3, 4)
Here is my current code:
sequence_raw_buffer = open('c:\\sequence.txt', 'a')
with open('c:\\raw-sequence.txt') as f:
number_list = f.read().splitlines()
number_all = re.findall(r'\bPB\b([0-9])\d+', ' '.join(number_list))
unique = list(set(number_all))
for i in unique:
sequence_raw_buffer.write(i + '\n')
print "done"
f.close()
sequence_raw_buffer.close()
remodule documentation.