RegEx for matching specific pattern in a Python list

Question

Say, I have the following code:

import re
strings_of_text = ['data0', 'data23', 'data2', 'data55', 'data_mismatch', 'green']
strings_to_keep = []
expression_to_use = r'^\d{2}$|(?<=\s)\d{2}(?=\s)|(?<=\s)\d{2}$|^\d{2}(?=\s)'
    
for string in strings_of_text:
    # If the string is data#
    if (re.search(expression_to_use, string)):
        strings_to_keep.append(string)
print(strings_to_keep)

Where I am only concerned with adding strings with the pattern "data" followed by some number. So in this case, I would only want to add 'data0', 'data23', 'data2', 'data55'

How can I do this? I am thinking I will need to import re but I'm not sure how to use it.

I have read this: Python Regular Expression looking for two digits only

But when I try to modify my regular expression using this expression

^\d{2}$|(?<=\s)\d{2}(?=\s)|(?<=\s)\d{2}$|^\d{2}(?=\s)

It does not work... This is where I am stuck. I am new to using regular expressions so thank you to all of those who post in advance

EDIT:

Here is the outcome I am trying to get:

print(strings_to_keep)
>>> ['data0', 'data23', 'data2', 'data55']

A desired output would be helpful.

Error - Syntactical Remorse
– Error - Syntactical Remorse

2019-05-20 15:21:37 +00:00
Commented May 20, 2019 at 15:21 — Error - Syntactical Remorse
– Error - Syntactical Remorse, Commented May 20, 2019 at 15:21

The fourth bird · Accepted Answer · 2019-05-20 15:37:46Z

1

In your pattern you are using 4 alternations but you are not taking the word data into account.

You could use re.match instead to start the match from the beginning of the string and use data\d+$ to match data followed by 1+ digits until the end of the string:

import re
strings_of_text = ['data0', 'data23', 'data2', 'data55', 'data_mismatch', 'green']
strings_to_keep = []
expression_to_use = r'data\d+$'

for string in strings_of_text:
    # If the string is data#
    if (re.match(expression_to_use, string)):
        strings_to_keep.append(string)

print(strings_to_keep)

Python demo

You might keep working with a filtered collection instead of creating a new one using for example filter:

import re
strings_of_text = ['data0', 'data23', 'data2', 'data55', 'data_mismatch', 'green']
strings_to_keep = []
expression_to_use = r'data\d+$'

strings_of_text = list(filter(lambda x: re.match(expression_to_use, x), strings_of_text))
print(strings_of_text)

Result

['data0', 'data23', 'data2', 'data55']

Python demo

edited May 20, 2019 at 15:37

answered May 20, 2019 at 15:23

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

The fourth bird Over a year ago

You are welcome. Perhaps you can use filter or a list comprehension to keep only a single collection and prevent having to use a separate collection.

Harikrishnan Balachandran Over a year ago

Nice solution. Is +$ really necessary?

Error - Syntactical Remorse · Accepted Answer · 2019-05-20 15:27:45Z

0

You should use re.compile if you are using the same pattern as it has less overhead.

strings_of_text = ['data0', 'data23', 'data2', 'data55', 'data_mismatch', 'green']

import re
engine = re.compile(r'data\d+$')
strings_to_keep = [s for s in strings_of_text if engine.match(s)]
print(strings_to_keep) # ['data0', 'data23', 'data2', 'data55']

answered May 20, 2019 at 15:27

Error - Syntactical Remorse

7,9454 gold badges29 silver badges58 bronze badges

Collectives™ on Stack Overflow

RegEx for matching specific pattern in a Python list

EDIT:

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

EDIT:

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related