3

I have following code which will store all the csv filename in a list from a specific folder

import pandas as pd
import re
import os

files = os.listdir('.')
filename=[filename for filename in files if filename.endswith('.csv')]

However, in my folder, I have two types of csv files, one ends with, for example, _20.cvs(or maybe _18.csv,_01.csv), another one ends with _Raw.csv;

However I only need the first type stored in my list. I know regular expression may can help me on that, so I did some google search, and come up with the following code, but it seems doesn't work, can anyone offer a advice?

filename = [re.search(r'^\d{2}.csv'),filename).group(0) for filename in files] 
1
  • BTW, do you have _20.cvs or _20.csv? Commented Nov 22, 2018 at 8:10

4 Answers 4

7

You need to remove ^ (as it matches the start of string location), add $ at the end of the pattern (to make sure the match is at the end of the string) and escape the dot (else, . matches any char but a line break char).

Note you must check if there is a match before accessing .group():

result = [f for f in files if re.search(r'_\d{2}\.csv$', f)] 

Details

  • _ - an underscore
  • \d{2} - 2 digits
  • \. - a literal dot
  • csv - csv text
  • $ - end of string.

See the regex demo.

Python demo:

import re
files = ["gfrt_32_20.csv", "wertf_18.csv", "12_01.csv", "ith_Raw.csv"]
result = [f for f in files if re.search(r'_\d{2}\.csv$', f)] 
print(result)
# => ['gfrt_32_20.csv', 'wertf_18.csv', '12_01.csv']
Sign up to request clarification or add additional context in comments.

1 Comment

@Borisu There is no need adding the details about re.match and re.search difference into my answer as OP problem is not related to it. Here is a good thread on that.
3

re.match would not work because it matches at the beginning. Use re.search instead. But everything else is fine in the previous solution.

import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.search(r'(_\d+.csv)', f)]
print(filenames)

Comments

1

Try to use re.match method:

import os
import re
files = os.listdir('.')
filenames = [f for f in files if re.match(r'(_\d+.csv)', f)]
print(filenames)

7 Comments

it gives me empty; the filename is like :Mydate_2018_11_22.csv; but we only care about 22.csv? right? for this instance, i think match and search are the same?
@Frank could you provide exmples with full names of files, which you need to collect? Yes, they are same. In this case re.search and re.match can replace each other.
[f for f in files if re.search(r'(_\d+.csv)', f)] works; [f for f in files if re.match(r'(_\d+.csv)', f)] doesn't
Data_100000_11_22.csv
@Frank for "Data_100000_11_22.csv" i got this regex: r'^Data_(\d+)_(\d+)_(\d+).csv$'
|
1

You should put the regex operation in the if clause so as to filter out those you don't want.

You should also escape the . in the regex, since dots have special meaning in regex (match all non-line terminators).

[filename for filename in files if re.search(r'\d{2}\.csv$', filename)]

If you want only the matched bit, you can do a simple substring:

[filename[-6:] for filename in files if re.search(r'\d{2}\.csv$', filename)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.