1

I have a list that looks like this:

list = ['Julio Cesar por inhumana (?)', '1/4/2015', '1/4/2015', '1/4/2015']

and i just want the dates. I have a regex that looks like this :

r'\b(\d+/\d+/\d{4})\b'

but i don´t really know how to use it in a list. Or maybe can be done in other way

Any help will be really appreciated

0

3 Answers 3

6

Very simple. Just use re.match:

>>> import re
>>> mylist = ['Julio Cesar por inhumana (?)', '1/4/2015', '1/4/2015', '1/4/2015']
>>> dates = [x for x in mylist if re.match(r'\b(\d+/\d+/\d{4})\b', x)]
>>> dates
['1/4/2015', '1/4/2015', '1/4/2015']

re.match only matches at the start of the string, so it's what you want for this case. Also, I wouldn't name a list "list" -- because that's the name of the built-in list class, you could hurt yourself later if you try to do list(some_iterable). Best not to get in that habit.

Finally, your regex will match a string that starts with a date. If you want to insure that the entire string is your date, you could modify it slightly to r'(\d{1,2}/\d{1,2}/\d{4})$' -- this will insure that the month and day are each 1 or 2 digits and the year is exactly 4 digits.

Sign up to request clarification or add additional context in comments.

Comments

3

If the list is long, compile the pattern first will result in better performance

import re

# list is a keyword in Python, so when used as a variable name, append
# underscore, according to PEP8 (https://www.python.org/dev/peps/pep-0008/)
# quote: single_trailing_underscore_ : used by convention to avoid conflicts
# with Python keyword, e.g.
list_ = ['Julio Cesar por inhumana (?)', '1/4/2015', '1/4/2015', '1/4/2015']

date_pattern = re.compile(r'\b(\d+/\d+/\d{4})\b')

print filter(date_pattern.match, list_)
# equivalent to
# print [i for i in list_ if date_pattern.match(i)]
# produces ['1/4/2015', '1/4/2015', '1/4/2015']

Comments

1

You can achieve this by using re.match().

Note: list is reserved keyword in Python. You should not use that.

import re
str_list = ['Julio Cesar por inhumana (?)', '1/4/2015', '1/4/2015', '1/4/2015']

# Using list(str_list) to iterate over the copy of 'str_list'
# to remove unmatched strings from the original list
for s in list(str_list):
    if not re.match(r'\b(\d+/\d+/\d{4})\b', s):
        str_list.remove(s)

OR, you may use list comprehension if you also want to keep original list:

import re
str_list = ['Julio Cesar por inhumana (?)', '1/4/2015', '1/4/2015', '1/4/2015']
new_list = [s for s in str_list if re.match(r'\b(\d+/\d+/\d{4})\b', s)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.