1

I have list of file in directory:

gp_dump_0_10_20171112003450 <==
gp_dump_0_11_20171112003450 <==
gp_dump_0_12_20171112003450 <==
gp_dump_0_13_20171112003450 <==
gp_dump_0_14_20171112003450 <==
gp_dump_1_1_20171112003450 <==
gp_dump_1_1_20171112003450_post_data
gp_dump_20171112003450_ao_state_file
gp_dump_20171112003450_co_state_file
gp_dump_20171112003450_last_operation
gp_dump_20171112003450.rpt

I want to fetch only marked ( <==) files from a directory. Below is the python code I have written which is not working as expected:

import os
import re

dump_key = 20171112003450
backup_files = os.listdir('/home/jadhavy/backup/')
segment_file_regex = "gp_dump_\d+?_\d+?_%s$" %dump_key
for file in backup_files:
        if file == re.finditer(segment_file_regex,file,re.S):
                print(file)

EDIT:Changed regex to match end, I'm not getting any result after running this.

1 Answer 1

3

Two main things:

  1. To check if a string matches a pattern, the function you want is re.match, not re.finditer. re.match will return a match object if the pattern matches the string at the beginning, or None if there is no match.
  2. The regex will also match gp_dump_1_1_20171112003450_post_data because it starts with a match. The $ metacharacter in a regex means the end-of-string, so if you put it at the end of the pattern it won't match strings with trailing characters.

Here is your code with the above adjustments:

import os
import re

dump_key = 20171112003450
backup_files = os.listdir('/home/jadhavy/backup/')
segment_file_regex = "gp_dump_\d+?_\d+?_%s$" %dump_key
for file in backup_files:
        if re.match(segment_file_regex,file,re.S):
                print(file)

Three other tips:

  1. You shouldn't need the re.S flag in this case, because it only affects the . metacharacter.
  2. Raw strings are usually a good idea when writing regexes to avoid accidentally interpreting one character as another since regexes tend to contain lots of backslashes. For example, r'\n' becomes '\\n' instead of '\n' (newline).
  3. When inserting a string into a regex, you can use re.escape to escape metacharacters. For example r'abc%sghi' % re.escape('[def]') becomes r'abc\[def\]ghi' instead of r'abc[def]ghi' which isn't the regex you'd want.
Sign up to request clarification or add additional context in comments.

2 Comments

You don't need \d+?_, because the underscores will stop the match anyway.
That reminds me, you can also use \d* instead of \d+? if the intent is to match 0 or more digits.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.