Embedded for loop with regex

Question

def find_string(header,file_1,counter):
  ab = re.compile(str(header))
  for line in file_1:
    if re.search(ab,line) !=None:
       print line
  counter+=1
  return counter

file_1 = open("text_file_with_headers.txt",'r')
header_array = []
header_array.append("header1")
header_array.append("header2")
# ...

counter = 0
for header in header_array:
  counter = find_string(header,file_1,counter)

Every time I run this it searches for only one of the headers and I cannot figure out why.

eyquem · Accepted Answer · 2011-04-23 23:45:49Z

Because when the loop for line in file_1: has ended for the first header, the file's pointer is at the end of the file. You must move this pointer to the file's beginning again, that is done with function seek() . You must add seek(0,0) like that

counter = 0 
for header in header_array:
    counter = find_string(header,file_1,counter)
    f1.seek(0,0)

.

EDIT

1) ab is a compiled regex, then you can write ab.search(line)

2) bool(None) is False, then you can write if ab.search(line): no need of != None

3)

def find_string(header,file_1,counter):
    lwh = re.compile('^.*?'+header+'.*$',re.MULTILINE)
    lines_with_header = lwh.findall(file-1.read())
    print ''.join(lines_with_header)
    return counter + 1

and even

def find_string(header,file_1,counter):
    lwh = re.compile('^.*?'+header+'.*$',re.MULTILINE)
    print ''.join(matline.group() for matline in lwh.finditer(file-1.read()) )
    return counter + 1

4)

def find_string(header,file_1):
    lwh = re.compile('^.*?'+header+'.*$',re.MULTILINE)
    lines_with_header = lwh.findall(file-1.read())
    print ''.join(lines_with_header)

file_1 = open("text_file_with_headers.txt",'r')
header_list = ["header1","header2",....]

for counter,header in header_list:
    find_string(header,file_1)
    file_1.seek(0,0)

counter += 1 # because counter began at 0

5) You run through file_1 as many times that there are headers in header_list.

You should run through it only one time and recording each line containing one of the headers in a list being one of the values of a dictionary whose keys should be the headers. It would be faster.

6) An array in Python is an array

Rachel Shallit · Accepted Answer · 2011-04-23 23:20:56Z

1

The file object keeps track of your position in the file, and after you've gone through the outer loop once, you're at the end of the file and there are no more lines to read.

If I were you, I would reverse the order in which your loops are nested: I would iterate through the file line by line, and for each line, iterate through the list of strings you want to find. That way, I would only have to read each line from the file once.

answered Apr 23, 2011 at 23:20

Rachel Shallit

2,02014 silver badges15 bronze badges

Collectives™ on Stack Overflow

Embedded for loop with regex

2 Answers 2

EDIT

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

EDIT

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related