1

I will be receiving data files in a Linux directory.

I need to validate that the file names follow the following pattern "NNN-YYYYMMDD-NNNNNNNNN.pdf" where

  • NNN stands for numeric value (0-9).
  • "YYYYMMDD" stands for valid date. YYYY is the year, MM is the month (between 1-12) and DD is the day of the month (can have values between 01 to 31 depending on the month).
  • NNNNNNNN is a numeric number (i.e. only 0-9 allowed).

What utility (SED, AWK etc.) and how should I use to validate the file name.

11
  • you are looking for some manual started script, which validates a string? Commented Jul 7, 2017 at 19:44
  • You can do this using grep Commented Jul 7, 2017 at 19:45
  • Please show me how. Commented Jul 7, 2017 at 19:47
  • Can you give me a sample file that you are expecting? Please post it to your question.. Commented Jul 7, 2017 at 19:49
  • Does the year have to be the current year? Or current or prior? Or any from 0000 to 9999? Commented Jul 7, 2017 at 19:52

4 Answers 4

2

This tests every file in the current directory, using bash's [[ operator, against the pattern:

  • start of string ^
  • 3 digits
  • -
  • 8 digits
  • -
  • 9 digits
  • .pdf
  • end of string $
  • that the middle 8 digits evaluate to a valid date according to GNU date

You can adjust the assumptions above easily enough.

for f in *
do
  [[ $f =~ ^([0-9][0-9][0-9])-([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])-([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]).pdf$ ]] && 
  date -d ${BASH_REMATCH[2]} &>/dev/null && 
  echo Valid: "$f"
done
4
  • nicer one, if year, month and day really always have the correct number count, maybe still remove the accidentally .pdf at the end :-) Commented Jul 7, 2017 at 20:25
  • Thanks a lot. I changed it to: for f in "123-20170730-123456789.pdf" do [[ $f =~ ^([0-9]{3})-([0-9]{8})-([0-9]{9}).pdf$ ]] && date -d ${BASH_REMATCH[2]} &>/dev/null if [ $? -eq 0 ] then echo "Good File" else echo "Bad File" fi; done Commented Jul 7, 2017 at 22:07
  • I have two (2) questions. Question 1: What is "~" in the expression. Question 2: Why does the following statement not work: for f in "123-20170730-123456789.pdf" do echo $f | grep '^([0-9]{3}-([0-9]{8})-([0-9]{9}).pdf$' &> /dev/null if [ $? -eq 0 ] then echo "Good File" else echo "Bad File" fi; done Commented Jul 7, 2017 at 22:11
  • @AlluSingh (1): Match a regex … (2) because you need to use *Extended" regex syntax and because you have not closed the first (. This do work: echo "123-20170730-123456789.pdf" | grep -E '^([0-9]{3})-([0-9]{8})-([0-9]{9}).pdf$' Commented Jul 7, 2017 at 22:26
1

sounds like:

TOCHECK=( "01-20170228-12345678" "012-20170230-012345678" "01-20170228-12345678" "123-20170730-012345678" )

for CHECK in $(seq 0 $(( ${#TOCHECK[@]}-1 )) ); do
    PARTS=( $(echo ${TOCHECK[$CHECK]} | sed "s/-/ /g")   )   
    echo -ne "\nchecking "
    echo "\"${PARTS[@]}\""
    echo "\"${PARTS[0]}\""
    echo "\"${PARTS[1]}\""
    echo "\"${PARTS[2]}\""

    if echo ${PARTS[0]} | grep "[0-9]\{3\}" ; then
        echo first part ok
    fi

    if echo ${PARTS[2]} | grep "[0-9]\{9\}" ; then
        echo last part ok
    fi  

    date --date="${PARTS[1]}"
    RES=$?
    echo $RES
    if [ 0$RES -eq 0 ]; then
        echo date OK
    fi  
done

(just some conceptual idea, of course to be modified)

1

It's not enough to use regexps. The validation is 2 steps: regexp matching and date validation. Here's a Python implementation:

from __future__ import print_function
import sys 
import re
import datetime

def validate(filename):
    match = re.match(r"[0-9]{3}-([0-9]{8})-[0-9]{8}\.pdf", filename)
    if not match:
        return False
    datestr = match.group(1)
    try:
        datetime.date(int(datestr[:4]), int(datestr[4:6]), int(datestr[6:8]))
    except ValueError:
        return False
    else:
        return True

if __name__ == "__main__":
    if validate(sys.argv[1]):
        print(":-)")
        sys.exit(0)
    else:
        print(":-(")
        sys.exit(1)

Usage: python validate.py FILE

One can probably use grep and date to do the same.

1

One basic solution, using grep. Doesn't do the detailed date checking aspect, instead merely checks it's numeric.

if ls|grep -vE '^[0-9]{3}-[0-9]{8}-[0-9]{8}\.pdf$'; then
    echo some bogus files found
else
    echo all good
fi
0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.