1

I have a file that contains random, junk ascii characters.

However, in the file there is also a message written in english.

Like this:

...˜ÃÕ=òaãNÜ ß§#üxwáã MESSAGE HIDDEN IN HERE ŸÎ=N‰çÈ^XvU…”vN˜...

I'm trying to write a python regex which will look for a pattern beginning with 6 letters or spaces and ending with 6 letters of spaces.

that way, as long as the message is a minimum of characters or spaces long, then it should output the message.

This is what I've come up with, but it doesn't seem to be working.

regex = re.compile('''
([A-Z ]){6,}                                        
([A-Z ]){6,}              
''', re.I | re.X )
1
  • Did any of the comments help you? Did you meet some newer problems? Commented Dec 15, 2013 at 16:43

3 Answers 3

4

Your Regex:

([A-Z ]){6,}                                        
([A-Z ]){6,}

Doesn't work because, As you can see it expects quite a lot of spaces between the two groups:

Regular expression visualization


Was this what you were looking for:

import re

reg = re.compile( "[A-Z ]{6,}[A-Z ]{6,}")
string = "...˜ÃÕ=òaãNÜ ß§#üxwáã MESSAGE HIDDEN IN HERE ŸÎ=N‰çÈ^XvU…”vN˜..."

print reg.findall(string)

Output:

[' MESSAGE HIDDEN IN HERE ']
Sign up to request clarification or add additional context in comments.

Comments

2

Try the following regular expression. Using your example I only needed to check one group:

import re
pattern_obj = re.compile('[a-zA-Z ]{6,}', re.I)
extracted_patterns = pattern_obj.findall(ur'your_string')
print extracted_patterns

From your Stackoverflow tag - I assume that you use Python 2. In such a case you have to take care that the string read in is unicode.

Output

[u' MESSAGE HIDDEN IN HERE ']

General recommendation: Sometimes it can be difficult to find a good regular expression. The mostly unknown flag re.DEBUG can be very useful in this case.

pattern_obj = re.compile('[a-zA-Z ]{6,}', re.DEBUG)
max_repeat 6 4294967295
  in
    range (97, 122)
    range (65, 90)
    literal 32

Comments

0
import re
word = re.compile('[a-zA-Z\s]{6,}.+[[a-zA-Z\s]{6,}]')

filein = open(filename, 'rb).read()
print re.findall(word, filein)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.