2

I'm pretty new to bash scripting and regexp and have a question. I want to check to see if my variable $name starts with a-d, e-h, i-l etc and do some stuff accordingly. If the string starts with "the." or "The." it should check the first letter after the period.

My problem is that if $name consists of "the.anchor" both the a-d0-9 and q-t will be true. Do you guys have any idea what's wrong?

if [[ $name =~ ^([tT]he\.)?[a-dA-D0-9]+ ]]; then
    do some stuff
fi

if [[ $name =~ ^([tT]he\.)?[e-hE-H]+ ]]; then
    do some stuff
fi

if [[ $name =~ ^([tT]he\.)?[i-lI-L]+ ]]; then
    do some stuff
fi

if [[ $name =~ ^([tT]he\.)?[m-pM-P]+ ]]; then
    do some stuff
fi

if [[ $name =~ ^([tT]he\.)?[q-tQ-T]+ ]]; then
    do some stuff
fi

if [[ $name =~ ^([tT]he\.)?[u-wU-W]+ ]]; then
    do some stuff
fi

if [[ $name =~ ^([tT]he\.)?[x-zX-Z]+ ]]; then
    do some stuff
fi

Thanks in advance!

3 Answers 3

2

Your first part it optional:

([tT]he\.)?

So the.anchor matches the pattern ^([tT]he\.)?[a-dA-D0-9]+ because the the. matches `^([tT]he\.)? and the a matches [a-dA-D0-9]+. It matches ^([tT]he\.)?[q-tQ-T]+ because ^([tT]he\.)? is optional an t matches [q-tQ-T]+. Note not the whole input is consumed by the second pattern, in fact only the first character is grabbed.

You can verify this by having bash echo the match:

echo "${BASH_REMATCH[0]}"

Which should print the.anchor in the first case and t in the second.

You do not have an end anchor on the pattern so only part of the input needs to be matched. If you made the second pattern ^([tT]he\.)?[q-tQ-T]+$ then it would not match.

Alternatively you could make the the first part possessive - ^([tT]he\.)?+. This will mean that if the engine matches the first expression it will not be unmatched. In the latter case ^([tT]he\.)?+ will grab the the. and then not release it when [q-tQ-T]+ fails; this will cause the match to fail.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for answering. If I change the q-t to ^([tT]he\.)?[q-tQ-T]+$ it wont match if name is "the.tally"
@JonnyQuest no, it will not. You could also make the first part possessive so that if it is matched it will not be backtracked.
Found an "answer" to my question. Will accept my own answer in 2 days when I can. Thanks again!
0

I figured out a way to fix my problem by using elif statements and putting the q-t part as the last one

Comments

0

I think the ? can be removed as the if statement is already doing the test. The + matches the preceding item at least once and would only be needed if you want to match more than one instance of the letters.

You can do it like this:

if [[ $name =~ ^[tT]he\.[a-dA-D0-9] ]]; then
    do some stuff
fi

The condition will only return true if the first character after ^[tT]he\. is [a-dA-D0-9].

However, I tend to think case is a cleaner solution than if statements when matching lists of characters against variables.

case $name in
    [tT]he\.[a-dA-D0-9]*)
        do some stuff
        ;;
esac

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.