I am new to python and I am trying to extract data from a large unsorted text file. I would like to know if it is possible to extract all the data on a line where a single word "stop_codon" occurs through the text document. this is what i have so far...
import re
regex = re.compile("stop_codon([^U]+)")
contigdata = open("contigs.txt").read()
for match in regex.finditer(contigdata):
rules = match.group(0).splitlines()
for rule in rules:
if rule and not rule.startswith("#"):
print rule
This is the output that the script is producing and i would prefer if it was all on the one line.
contig00002 A
stop_codon 2076 2078 . + 0 transcript_id "g2.t1"; gene_id "g2";
Any help would be gratefully appreciated!