4

I have been trying to extract part of string in bash. I'm using it on Mac.

Pattern of input string:

  • Some random word follow by a /. This is optional.
  • Keyword (def, foo, and bar) followed by hyphen(-) followed by numbers. This can be 2-6 digit numbers
  • These numbers are followed by hyphens again and few hyphen separated words.

Sample inputs and outputs:

abc/def-1234-random-words // def-1234
bla/foo-12-random-words // foo-12
bar-12345-random-words // bar-12345

So I tried following command to fetch it but for some weird reason, it returns entire string.

extractedValue=`getInputString | sed -e 's/.*\(\(def\|bar\|foo\)-[^-]*\).*/\1/g'`
// and
extractedValue=`getInputString | sed -e 's/.*\(\(def\|bar\|foo\)-\d{2,6}\).*/\1/g'`

I also tried to make it case-insensitive using I flag but it threw error for me:

: bad flag in substitute command: 'I'


Following are the references I tried:

2
  • 1
    sed doesn't support \d for digits, you can use [0-9] Commented Oct 6, 2021 at 15:19
  • @Barmar i noticed some weird behaviour around \d. Hence i moved to [^-]*. It used to match it but always returned entire string. But I'll read more about it Commented Oct 6, 2021 at 15:22

3 Answers 3

4

You can use the -E option to use extended regular expressions, then you don't have to escape ( and |.

echo abc/def-1234-random-words  | sed -E -e 's/.*((def|bar|foo)-[^-]*).*/\1/g'
def-1234
Sign up to request clarification or add additional context in comments.

1 Comment

This along with gsed for case-insensitivity flag /I solved my issue. Thanks a TON!
2

This gnu sed should work with ignore case flag:

sed -E 's~^(.*/){0,1}((def|foo|bar)-[0-9]{2,6})-.*~\2~I' file

def-1234
foo-12
bar-12345

This sed matches:

  • (.*/){0,1}: Match a string upto / optionally at the start
  • (: Start capture group #2
    • (def|foo|bar): Match def or foo or bar
    • -: Match a -
    • [0-9]{2,6}: Match 2 to 6 digits
  • ): End capture group #2
  • -.*: Match - followed by anything till end
  • Substitution is value we capture in group #2

Or you may use this awk:

awk -v IGNORECASE=1 -F / 'match($NF, /^(def|foo|bar)-[0-9]{2,6}-/) {print substr($NF, 1, RLENGTH-1)}' file

def-1234
foo-12
bar-12345

Awk explanation:

  • -v IGNORECASE=1: Enable ignore case matching
  • -F /: Use / as field separator
  • match($NF, /^(def|foo|bar)-[0-9]{2,6}-/): Match text using regex ^(def|foo|bar)-[0-9]{2,6}- in $NF which is last field using / as field separator (to ignore text before /)
  • If match is successful then using substr print text from position 1 to RLENGTH-1 (since we matching until - after digits)

4 Comments

Could you please also add explanation? What $NF means and is this case sensitive?
I am going to add. Meanwhile check sed which will do ignore case matchig
Weird thing is, sed approach is still throwing this error: : bad flag in substitute command: 'I'. Is it environment specific? I'm using ZSH over Mac terminal
Yes as I mentioned that requires gnu sed. sed on Mac is BSD and that doesn't support /I. I am also on Mac but have gnu sed installed using home brew
0

Use grep with the --only-matching option (shorthand -o).

grep --only-matching --extended-regexp '(foo|bar|def)-[0-9]{2,6}' <<EOF
abc/def-1234-random-words
bla/foo-12-random-words
bar-12345-random-words
EOF

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.