16

I want to be able to loop through a list of files that match a particular pattern. I can get unix to list these files using ls and egrep with a regular expression, but I cannot find a way to turn this into an iterative process. I suspect that using ls is not the answer. Any help would be gratefully received.

My current ls command looks as follows:

ls | egrep -i 'MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat'

I would expect the above to match:

  • MYFILE160418.dat
  • myFILE170312.DAT
  • MyFiLe160416.DaT

but not:

  • MYOTHERFILE150202.DAT
  • Myfile.dat
  • myfile.csv

Thanks,

Paul.

5
  • Hi @paul-frith, you should start with a whileor a for. Use a counter as in the example -> tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-7.html Commented Apr 20, 2016 at 12:40
  • In many cases, you do not need an explicit loop, since you can pass the arguments to another program via pipe, possibly using xargs. Commented Apr 20, 2016 at 12:44
  • Brilliant thank you - I hadn't realised you could use ls in a for loop. Commented Apr 20, 2016 at 12:51
  • Try: ls | egrep -i 'MYFILE\d{6}\.dat' Commented Apr 20, 2016 at 12:53
  • Saleem, Whilst neater, I believe this would give looser matching criteria, using your method would allow for month of greater than 12 and day of greater than 31. Commented Apr 20, 2016 at 13:01

4 Answers 4

10

You can use (GNU) find with the regex search option instead of parsing ls.

find . -regextype "egrep" \
       -iregex '.*/MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat' \
       -exec [[whatever you want to do]] {} \;

Where [[whatever you want to do]] is the command you want to perform on the names of the files.

From the man page

-regextype type
          Changes  the regular expression syntax understood by -regex and -iregex tests 
          which occur later on the command line.  Currently-implemented types are 
          emacs (this is the default),posix-awk, posix-basic, posix-egrep and 
          posix-extended.

  -regex pattern
          File name matches regular expression pattern.  This is a match on the whole 
          path, not a search.  For example, to match a file named `./fubar3', you can 
          use the regular expression
          `.*bar.' or `.*b.*3', but not `f.*r3'.  The regular expressions understood by 
          find are by default Emacs Regular Expressions, but this can be changed with 
          the -regextype option.

  -iregex pattern
          Like -regex, but the match is case insensitive.
Sign up to request clarification or add additional context in comments.

1 Comment

Interesting - funnily enough "find" was what I looked to in the first place, but I couldn't get my regex to work. -regextype "egrep" is what I needed!
9

Based on the link Andy K provided I have used the following to loop based on my matching criteria:

for i in $(ls | egrep -i 'MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat' ); do             
 echo item: $i;         
done

2 Comments

I've looked at this and it seems that parsing ls is a bad idea due to UNIX allowing almost any character in a file name, including newline feeds etc. However given that I am regex matching, surely that problem is mitigated in this instance. Are there other reasons to not parse an ls?
Do not use ls output for anything. ls is a tool for interactively looking at directory metadata. Any attempts at parsing ls output with code are broken. Globs are much more simple AND correct: for file in *.txt. Read Parsing ls
0

I was looking for listing sda* and sdb* ending with one digit in /dev and I found that ls works by itself in this case:

> /dev/sd[ab][0-9]
/dev/sda1  /dev/sda2  /dev/sda3  /dev/sda4  /dev/sda5  /dev/sda6  /dev/sdb1

But there is a limitation that is it does not like the + character, if I want to search for example with several digit at end:

> ls /dev/sd[ab][0-9]+
ls: cannot access /dev/sd[ab][0-9]+: No such file or directory

And here you'll indeed need to use ls | egrep ... or find as other mentioned, but since in your regex you don't have a +, this should work for you:

ls MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat

Comments

0

paul frith's answer is best. However, here is a little bit more comprehensible solution.

for file in *; do
  if [[ "$file" =~ MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat ]]; then
    echo "$file"
  fi
done

Or you can use:

ls | egrep 'MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat' | while read -r i; do
  echo "$i"
done

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.