0

I'm going through a binary file with regexes extracting data, and I'm having a problem with regex I can't track down.

This is the code I'm having issues with:

        z = 0
        for char in string:
            self.response.out.write('|%s' % char.encode('hex'))
            z+=1
            if z > 20:
                self.response.out.write('<br>')
                break

        title = []
        string = re.sub('^\x72.([^\x7A]+)', lambda match: append_match(match, title), string, 1)
        print_info('Title', title)

def append_match(match, collection, replace = ''):
    collection.append(match.group(1))
    return replace

This is the content of the first 20 chars in string when this runs:

|72|0a|50|79|72|65|20|54|72|6f|6c|6c|7a|19|54|72|6f|6c|6c|62|6c

It returns nothing, except if I remove the ^, in which case it returns "Troll" (not the quotes) which is 54726F6C6C. It should be returning everything up to the \x7a as I read it.

What's going on here?

2
  • 1
    Your input string doesn't start with a \x72 character--it starts with a pipe. *edit Never mind...I think I misinterpreted your input example. Commented Mar 21, 2013 at 18:29
  • Yeah sorry. Was making it easier to tell each distinct char. Commented Mar 21, 2013 at 18:35

1 Answer 1

2

The problem is that \x0A (=newline) won't be matched by the dot by default. Try adding the dotall flag to your pattern, for example:

re.sub('(?s)^\x72.([^\x7A]+)....
Sign up to request clarification or add additional context in comments.

2 Comments

You're my hero. Adding the dotall worked. What is the (?s) you added though?
This is the same dotall flag, but added inline to the expression. I usually prefer this syntax as it makes expressions clear and self-contained.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.