How to extract a value from a string using regex and a shell?

Question

I am in shell and I have this string: 12 BBQ ,45 rofl, 89 lol

Using the regexp: \d+ (?=rofl), I want 45 as a result.

Is it correct to use regex to extract data from a string? The best I have done is to highlight the value in some of the online regex editor. Most of the time it remove the value from my string.

I am investigating expr, but all I get is syntax errors.

How can I manage to extract 45 in a shell script?

IMHO for this purpose, using Regex is completely acceptable. — Mahesh Velaga
– Mahesh Velaga, Commented Jul 23, 2010 at 16:52
What tool do you use, what shell do you use, what's the exact commandline you used and what's the error you got? — Abel
– Abel, Commented Jul 23, 2010 at 17:00
A comprehensive answer from Unix.SE: unix.stackexchange.com/questions/193223/… — Anton Tarasenko
– Anton Tarasenko, Commented Nov 3, 2019 at 22:26

Nathan · Accepted Answer · 2021-03-22 21:21:17Z

102

You can do this with GNU grep's perl mode:

echo "12 BBQ ,45 rofl, 89 lol" | grep -P '\d+ (?=rofl)' -o
echo "12 BBQ ,45 rofl, 89 lol" | grep --perl-regexp '\d+ (?=rofl)' --only-matching

-P and --perl-regexp mean Perl-style regular expression. -o and --only-matching mean to output only the matching text.

edited Mar 22, 2021 at 21:21

Nathan

9,1119 gold badges57 silver badges84 bronze badges

answered Jul 23, 2010 at 16:52

Matthew Flaschen

286k53 gold badges523 silver badges554 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

AlexKorovyansky Over a year ago

Is it possible to avoid usage of perl-style, because it's removed from grep in OS X since Mountain Lion?

AlexKorovyansky Over a year ago

Possible alternative/workaround for OS X is using of gnu grep via homebrew, heystephenwood.com/2013/09/install-gnu-grep-on-mac-osx.html.

Marcello DeSales Over a year ago

I can retrieve the port number of docker containers :D with docker port c62c1c7b9efb | grep -P '(\d+)$' -o

Charlie Over a year ago

Seems in busybox it's -E instead of -P

joanis Over a year ago

There is a suggested edit that with BSD grep (e.g., on a Mac), it would be -E instead of -P, but I believe that won't work, since -P in GNU Grep is for Perl mode, while -E in (most?) greps is extended mode, which is quite different. I just tested a different version's -E option and it output nothing instead of the expected 45.

|

Steve Weet · Accepted Answer · 2010-07-23 17:00:42Z

37

Yes regex can certainly be used to extract part of a string. Unfortunately different flavours of *nix and different tools use slightly different Regex variants.

This sed command should work on most flavours (Tested on OS/X and Redhat)

echo '12 BBQ ,45 rofl, 89 lol' | sed  's/^.*,\([0-9][0-9]*\).*$/\1/g'

answered Jul 23, 2010 at 17:00

Steve Weet

28.5k11 gold badges72 silver badges86 bronze badges

4 Comments

Harold Fischer Over a year ago

You don't need the anchors if your regex begins and ends with .*

Harold Fischer Over a year ago

Also +1. Your answer is POSIX-compliant where the accepted answer is not, as the accepted answer uses the nonstandard -P grep option

Alex Over a year ago

I'm using this on macOS where zsh is the default by wrapping the command in /bin/sh -c "command". Works nicely!

MikeKulls Feb 11 at 0:00

@HaroldFischer you need it to gobble up the non matched part of the string. It only replaces what the regex captures. So you need the .* to match the start and end of the string

Abel · Accepted Answer · 2010-07-23 16:59:10Z

9

It seems that you are asking multiple things. To answer them:

Yes, it is ok to extract data from a string using regular expressions, that's what they're there for
You get errors, which one and what shell tool do you use?
You can extract the numbers by catching them in capturing parentheses:
```
.*(\d+) rofl.*
```
and using $1 to get the string out (.* is for "the rest before and after on the same line)

With sed as example, the idea becomes this to replace all strings in a file with only the matching number:

sed -e 's/.*(\d+) rofl.*/$1/g' inputFileName > outputFileName

or:

echo "12 BBQ ,45 rofl, 89 lol" | sed -e 's/.*(\d+) rofl.*/$1/g'

edited Jul 23, 2010 at 16:59

answered Jul 23, 2010 at 16:53

Abel

57.5k25 gold badges159 silver badges260 bronze badges

10 Comments

Daenyth Over a year ago

You don't need either of the .* in your example. You only need those on edges if your regex is anchored. Unanchored, it will already match anywhere inside the string.

Abel Over a year ago

The OP asked to get only the number out, not to do a succesful match. By adding .*, it's a simple way to match everything and replace by what's in the matching parentheses. Without them, the rest of the string remains intact, which is not what was asked (iiuc). Or did I miss something perhaps?

Daenyth Over a year ago

Woops, I missed that you were using sed for this. Carry on.

Harold Fischer Over a year ago

What implementation of sed are you using? $1 in craziness

Harold Fischer Over a year ago

@Abel I guess "craziness" is a bit much. I mean to say it is not remotely portable (as far as POSIX is concerned). In fact, your answer does not work on the seds that come with Ubuntu 14.04, Ubuntu 18.04 and FreeBSD 11 (both because of the non-portable \d and that fact that $1 is treated literally in this context)

|

Sjoerd · Accepted Answer · 2020-09-30 08:01:52Z

6

Using ripgrep's replace option, it is possible to change the output to a capture group:

rg --only-matching --replace '$1' '(\d+) rofl'

--only-matching or -o outputs only the part that matches instead of the whole line.
--replace '$1' or -r replaces the output by the first capture group.

answered Sep 30, 2020 at 8:01

Sjoerd

75.9k16 gold badges140 silver badges180 bronze badges

Comments

ghostdog74 · Accepted Answer · 2010-07-23 18:01:13Z

2

you can use the shell(bash for example)

$ string="12 BBQ ,45 rofl, 89 lol"
$ echo ${string% rofl*}
12 BBQ ,45
$ string=${string% rofl*}
$ echo ${string##*,}
45

answered Jul 23, 2010 at 18:01

ghostdog74

346k62 gold badges264 silver badges349 bronze badges

1 Comment

John Prawyn Over a year ago

Useful link - aty.sdsu.edu/bibliog/latex/debian/bash.html

Daniel Serodio · Accepted Answer · 2013-04-12 22:36:42Z

-1

You can certainly extract that part of a string and that's a great way to parse out data. Regular expression syntax varies a lot so you need to reference the help file for the regex you're using. You might try a regular expression like:

[0-9]+ *[a-zA-Z]+,([0-9]+) *[a-zA-Z]+,[0-9]+ *[a-zA-Z]+

If your regex program can do string replacement then replace the entire string with the result you want and you can easily use that result.

You didn't mention if you're using bash or some other shell. That would help get better answers when asking for help.

edited Apr 12, 2013 at 22:36

Daniel Serodio

4,5445 gold badges41 silver badges35 bronze badges

answered Jul 23, 2010 at 16:57

Jay

14.5k5 gold badges47 silver badges74 bronze badges

Comments

Justin · Accepted Answer · 2016-09-13 04:00:17Z

-2

You can use rextract to extract using a regular expression and reformat the result.

Example:

[$] echo "12 BBQ ,45 rofl, 89 lol" | ./rextract '[,]([\d]+) rofl' '${1}'
45

edited Sep 13, 2016 at 4:00

Justin

25.7k12 gold badges97 silver badges145 bronze badges

answered Sep 13, 2016 at 3:07

Tim Savannah

192 bronze badges

1 Comment

Justin Over a year ago

You need to add a disclaimer if a library is your own (something like " Disclaimer: I made this library "). And from github, it does appear that this library / executable is your own

Collectives™ on Stack Overflow

How to extract a value from a string using regex and a shell?

7 Answers 7

8 Comments

4 Comments

10 Comments

Comments

1 Comment

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

8 Comments

4 Comments

10 Comments

Comments

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related