79

I am in shell and I have this string: 12 BBQ ,45 rofl, 89 lol

Using the regexp: \d+ (?=rofl), I want 45 as a result.

Is it correct to use regex to extract data from a string? The best I have done is to highlight the value in some of the online regex editor. Most of the time it remove the value from my string.

I am investigating expr, but all I get is syntax errors.

How can I manage to extract 45 in a shell script?

3
  • IMHO for this purpose, using Regex is completely acceptable. Commented Jul 23, 2010 at 16:52
  • 1
    What tool do you use, what shell do you use, what's the exact commandline you used and what's the error you got? Commented Jul 23, 2010 at 17:00
  • A comprehensive answer from Unix.SE: unix.stackexchange.com/questions/193223/… Commented Nov 3, 2019 at 22:26

7 Answers 7

102

You can do this with GNU grep's perl mode:

echo "12 BBQ ,45 rofl, 89 lol" | grep -P '\d+ (?=rofl)' -o
echo "12 BBQ ,45 rofl, 89 lol" | grep --perl-regexp '\d+ (?=rofl)' --only-matching

-P and --perl-regexp mean Perl-style regular expression. -o and --only-matching mean to output only the matching text.

Sign up to request clarification or add additional context in comments.

8 Comments

Is it possible to avoid usage of perl-style, because it's removed from grep in OS X since Mountain Lion?
Possible alternative/workaround for OS X is using of gnu grep via homebrew, heystephenwood.com/2013/09/install-gnu-grep-on-mac-osx.html.
I can retrieve the port number of docker containers :D with docker port c62c1c7b9efb | grep -P '(\d+)$' -o
Seems in busybox it's -E instead of -P
There is a suggested edit that with BSD grep (e.g., on a Mac), it would be -E instead of -P, but I believe that won't work, since -P in GNU Grep is for Perl mode, while -E in (most?) greps is extended mode, which is quite different. I just tested a different version's -E option and it output nothing instead of the expected 45.
|
37

Yes regex can certainly be used to extract part of a string. Unfortunately different flavours of *nix and different tools use slightly different Regex variants.

This sed command should work on most flavours (Tested on OS/X and Redhat)

echo '12 BBQ ,45 rofl, 89 lol' | sed  's/^.*,\([0-9][0-9]*\).*$/\1/g'

4 Comments

You don't need the anchors if your regex begins and ends with .*
Also +1. Your answer is POSIX-compliant where the accepted answer is not, as the accepted answer uses the nonstandard -P grep option
I'm using this on macOS where zsh is the default by wrapping the command in /bin/sh -c "command". Works nicely!
@HaroldFischer you need it to gobble up the non matched part of the string. It only replaces what the regex captures. So you need the .* to match the start and end of the string
9

It seems that you are asking multiple things. To answer them:

  • Yes, it is ok to extract data from a string using regular expressions, that's what they're there for
  • You get errors, which one and what shell tool do you use?
  • You can extract the numbers by catching them in capturing parentheses:

    .*(\d+) rofl.*
    

    and using $1 to get the string out (.* is for "the rest before and after on the same line)

With sed as example, the idea becomes this to replace all strings in a file with only the matching number:

sed -e 's/.*(\d+) rofl.*/$1/g' inputFileName > outputFileName

or:

echo "12 BBQ ,45 rofl, 89 lol" | sed -e 's/.*(\d+) rofl.*/$1/g'

10 Comments

You don't need either of the .* in your example. You only need those on edges if your regex is anchored. Unanchored, it will already match anywhere inside the string.
The OP asked to get only the number out, not to do a succesful match. By adding .*, it's a simple way to match everything and replace by what's in the matching parentheses. Without them, the rest of the string remains intact, which is not what was asked (iiuc). Or did I miss something perhaps?
Woops, I missed that you were using sed for this. Carry on.
What implementation of sed are you using? $1 in craziness
@Abel I guess "craziness" is a bit much. I mean to say it is not remotely portable (as far as POSIX is concerned). In fact, your answer does not work on the seds that come with Ubuntu 14.04, Ubuntu 18.04 and FreeBSD 11 (both because of the non-portable \d and that fact that $1 is treated literally in this context)
|
6

Using ripgrep's replace option, it is possible to change the output to a capture group:

rg --only-matching --replace '$1' '(\d+) rofl'
  • --only-matching or -o outputs only the part that matches instead of the whole line.
  • --replace '$1' or -r replaces the output by the first capture group.

Comments

2

you can use the shell(bash for example)

$ string="12 BBQ ,45 rofl, 89 lol"
$ echo ${string% rofl*}
12 BBQ ,45
$ string=${string% rofl*}
$ echo ${string##*,}
45

1 Comment

-1

You can certainly extract that part of a string and that's a great way to parse out data. Regular expression syntax varies a lot so you need to reference the help file for the regex you're using. You might try a regular expression like:

[0-9]+ *[a-zA-Z]+,([0-9]+) *[a-zA-Z]+,[0-9]+ *[a-zA-Z]+

If your regex program can do string replacement then replace the entire string with the result you want and you can easily use that result.

You didn't mention if you're using bash or some other shell. That would help get better answers when asking for help.

Comments

-2

You can use rextract to extract using a regular expression and reformat the result.

Example:

[$] echo "12 BBQ ,45 rofl, 89 lol" | ./rextract '[,]([\d]+) rofl' '${1}'
45

1 Comment

You need to add a disclaimer if a library is your own (something like " Disclaimer: I made this library "). And from github, it does appear that this library / executable is your own

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.