How can I create a regular expression in Python?

Question

I'm trying to create regular expressions to filter certain text from a text file. What I want to filter has this format:

word_*_word.word

So for example, I would like the python code every match. Sample results would be:

program1_0.0-1_log.build
program2_0.1-3_log.build

How can I do this?

Thanks a lot for your help

ThomasH · Accepted Answer · 2009-12-12 23:32:56Z

3

Try something like this:

r'[a-zA-Z0-9]+_[^_]+_[a-zA-Z0-9]+\.[a-zA-Z0-9]+'

answered Dec 12, 2009 at 23:32

ThomasH

23.7k13 gold badges64 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alex Martelli · Accepted Answer · 2009-12-12 23:33:42Z

3

Looks like you want to use a pattern such as r'\w+_.*_\w+\.\w+' -- assuming that * you have does stand for "zero or more totally arbitrary characters" (if not, then the .* part in the middle needs to be changed accordingly). Once you have the right pattern (depending exactly on what you mean by that *;-), you can re.compile it to get a regular expression object, and use the .findall method of the RE object, with your overall string as an argument, to get a list of all non-overlapping substrings matching this pattern (there are also alternatives such as e.g. .finditer if you want to get one such substring at a time, looping over them).

answered Dec 12, 2009 at 23:33

Alex Martelli

887k175 gold badges1.3k silver badges1.4k bronze badges

Comments

sth · Accepted Answer · 2009-12-12 23:32:35Z

1

Python's regular expression module is called re. You need to import it and use the provided functions:

import re
if re.match(r'\w+_.*_\w+.\w+', "some_text_abc.x"):
   print "yeah."

It is useful to prefix the regular expression string with r, so that it will be interpreted literally, without special handling for escape characters. Otherwise backslashes will be treated specially by the python interpreter and backslashes that are part of the regular expression need to be escaped.

answered Dec 12, 2009 at 23:32

sth

231k56 gold badges288 silver badges370 bronze badges

3 Comments

Alex Martelli Over a year ago

This matches "any number of underscores" which seems peculiar (and doesn't satisfy the OP's example).

Alex Martelli Over a year ago

@sth, tx -- also, re.match only matches at the start of the string (as if the pattern started with an implied ^, in a sense) so it probably won't get "every match" in the file as the OP asks for.

sth Over a year ago

Well, my main point was more to point to the re module, it's docs and it's basic usage. It seemed to me like this is the basic problem the OP wants to solve first, before caring about the exact regular expression and what exactly should be matched. (I basically just focused on the in python part, not the what regular expression part)

nacmartin · Accepted Answer · 2009-12-12 23:30:09Z

0

try with ^\w+_.*_\w+\.\w+$

answered Dec 12, 2009 at 23:30

nacmartin

2,18218 silver badges15 bronze badges

1 Comment

Alex Martelli Over a year ago

You won't get "every match", as the OP desires, by anchoring the pattern so that it only matches an entire line (if you remembered to specify re.MULTILINE -- otherwise, only the entire file, and only if it had no newlines within if you didn't specify re.DOTALL;-).

ghostdog74 · Accepted Answer · 2009-12-13 06:15:28Z

0

i don't understand why you would need a regex here. If the strings you want ends with ".build", you can do this for example

s="blah blah program1_0.0-1_log.build blah blah"    
for item in s.split():
    if item.endswith(".build"):
        print item

and that's it. If you want to do further checking, then

for item in s.split():
    if item.endswith(".build"):
        s = item.split("_")
        if len(s) != 3:
           print "not enough _"

answered Dec 13, 2009 at 6:15

ghostdog74

346k62 gold badges264 silver badges349 bronze badges

2 Comments

Adam Ryan Over a year ago

you might prefer to use a regex to find the stuff with 1 line of code, as opposed to your multi-line loops.

ghostdog74 Over a year ago

i seldom use regex with Python, unless absolutely necessary. Using Python's internal string methods is faster as well, IMO.

Collectives™ on Stack Overflow

How can I create a regular expression in Python?

5 Answers 5

Comments

Comments

3 Comments

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

3 Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related