0

can somebody tell me how to express a regular expression for the following two strings:

c_source_files
cpp_source_files

I would like to analyze a text file which contains text segments beginning with the mentioned strings.

It could be expressed approximately as follows:

for result in re.findall('c(.*?)pp_source_files', re.S)
  # do something.... 

Thx in advance!

2 Answers 2

2

You can use this regex:

# 'c' optionally followed by 'pp', then followed by '_source_files'
r'c(pp)?_source_files'  

If you need these strings to be separate words (so that things like notc_source_files don't match), then you can use word boundary 'matchers':

# \b matches a word boundary
r'\bc(pp)?_source_files\b'  
Sign up to request clarification or add additional context in comments.

2 Comments

Why the downvote? This is even simpler than my RegEx.
Please check the edit. I have added the word boundaries.
1
import re
data = """
c_source_files
wutdafuc_source_files
cpp_source_files
pcpp_source_files
cp_source_files
"""
print list(re.findall(r'\b(?:c|cpp)_source_files\b', data))

2 Comments

They will get matched. OP has not specified if that is desirable or not.
Nice one, most people forget about boundries.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.