2

Say, I have the following code:

import re
strings_of_text = ['data0', 'data23', 'data2', 'data55', 'data_mismatch', 'green']
strings_to_keep = []
expression_to_use = r'^\d{2}$|(?<=\s)\d{2}(?=\s)|(?<=\s)\d{2}$|^\d{2}(?=\s)'
    
for string in strings_of_text:
    # If the string is data#
    if (re.search(expression_to_use, string)):
        strings_to_keep.append(string)
print(strings_to_keep)

Where I am only concerned with adding strings with the pattern "data" followed by some number. So in this case, I would only want to add 'data0', 'data23', 'data2', 'data55'

How can I do this? I am thinking I will need to import re but I'm not sure how to use it.

I have read this: Python Regular Expression looking for two digits only

But when I try to modify my regular expression using this expression

^\d{2}$|(?<=\s)\d{2}(?=\s)|(?<=\s)\d{2}$|^\d{2}(?=\s)

It does not work... This is where I am stuck. I am new to using regular expressions so thank you to all of those who post in advance

EDIT:

Here is the outcome I am trying to get:

print(strings_to_keep)
>>> ['data0', 'data23', 'data2', 'data55']
1
  • A desired output would be helpful. Commented May 20, 2019 at 15:21

2 Answers 2

1

In your pattern you are using 4 alternations but you are not taking the word data into account.

You could use re.match instead to start the match from the beginning of the string and use data\d+$ to match data followed by 1+ digits until the end of the string:

import re
strings_of_text = ['data0', 'data23', 'data2', 'data55', 'data_mismatch', 'green']
strings_to_keep = []
expression_to_use = r'data\d+$'

for string in strings_of_text:
    # If the string is data#
    if (re.match(expression_to_use, string)):
        strings_to_keep.append(string)

print(strings_to_keep)

Python demo

You might keep working with a filtered collection instead of creating a new one using for example filter:

import re
strings_of_text = ['data0', 'data23', 'data2', 'data55', 'data_mismatch', 'green']
strings_to_keep = []
expression_to_use = r'data\d+$'

strings_of_text = list(filter(lambda x: re.match(expression_to_use, x), strings_of_text))
print(strings_of_text)

Result

['data0', 'data23', 'data2', 'data55']

Python demo

Sign up to request clarification or add additional context in comments.

2 Comments

You are welcome. Perhaps you can use filter or a list comprehension to keep only a single collection and prevent having to use a separate collection.
Nice solution. Is +$ really necessary?
0

You should use re.compile if you are using the same pattern as it has less overhead.

strings_of_text = ['data0', 'data23', 'data2', 'data55', 'data_mismatch', 'green']

import re
engine = re.compile(r'data\d+$')
strings_to_keep = [s for s in strings_of_text if engine.match(s)]
print(strings_to_keep) # ['data0', 'data23', 'data2', 'data55']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.