Extract number embedded in string

Question

So I run a curl command and grep for a keyword.

Here is the (sanitized) result:

...Dir');">Town / Village</a></th><th><a href="javascript:SetFilter(3,'ListPublicASDF','ASDFDir');">Phone Number</a></th></tr><tr class="rowodd"><td><a href="javascript:calldialog('ASDF','&Mode=view&helloThereId=42',600,800);"...

I want to get the number 42 - a command line one-liner would be great.

search for the string helloThereId=
extract the number right beside it (42 in the above case)

Does anyone have any tips for this? Maybe some regex for numbers? I'm afraid I don't have enough experience to construct an elegant solution.

Avinash Raj · Accepted Answer · 2015-04-01 12:49:58Z

4

You could use grep with -P (Perl-Regexp) parameter enabled.

$ grep -oP 'helloThereId=\K\d+' file
42
$ grep -oP '(?<=helloThereId=)\d+' file
42

\K here actually does the job of positive lookbehind. \K keeps the text matched so far out of the overall regex match.

References:

edited Apr 1, 2015 at 12:49

answered Apr 1, 2015 at 12:44

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Eamorr Over a year ago

Very nice! It's highlighting 42, but it's still printing the big long string?

Avinash Raj Over a year ago

did you enable the -o parameter?

Eamorr Over a year ago

Actually, it works now. I was using "-P". It works perfectly with "-oP".

Community · Accepted Answer · 2017-05-23 11:50:47Z

2

If your grep version supports -P, (as is true for the OP, given that they're on Linux, which comes with GNU grep), Avinash Raj's answer is the way to go.

For the potential benefit of future readers, here are alternatives:

If your grep doesn't support -P, but does support -o, here's a pragmatic solution that simply extracts the number from the overall match in a 2nd step, by splitting the input into fields by =, using cut:

grep -Eo 'helloThereId=[0-9]+' in | cut -d= -f2 file

Finally, if your grep supports neither -P nor -o, use sed: Here's a POSIX-compliant alternative, using sed with a basic regular expression (hence the need to emulate + with \{1,\} and to escape the parentheses):

sed -n 's/.*helloThereId=\([0-9]\{1,\}\).*/\1/p' file

edited May 23, 2017 at 11:50

CommunityBot

11 silver badge

answered Apr 1, 2015 at 13:10

mklement0

452k68 gold badges728 silver badges988 bronze badges

2 Comments

user4453924 Over a year ago

You could also use [0-9][0-9]* for the same effect without adding messy backslashes

mklement0 Over a year ago

@JID: Thanks; that's definitely an option in this simple case, but the \{1,\} is handy in more complex cases, and it needs better PR :) (GNU sed implements \+, which leads people to believe that it's portable, but it's not.)

Ed Morton · Accepted Answer · 2015-04-01 14:11:13Z

1

This will work with any sed on any UNIX OS, even the pre-POSIX default sed on Solaris:

$ sed -n 's/.*helloThereId=\([0-9]*\).*/\1/p' file
42

answered Apr 1, 2015 at 14:11

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

4 Comments

mklement0 Over a year ago

So the pre-POSIX default sed on Solaris doesn't support \{1,\}? If not, @JID's suggestion should work: using [0-9][0-9]* to emulate +. (Your solution would match the empty string if helloThereId= is not followed by digits, and output \n.)

Ed Morton Over a year ago

Correct, putting a backslash before ERE metacharacters to enable their functionality in tools that support BREs was not part of the BRE language until fairly recently. If you stick a backslash before a { in Solaris sed it's still just a literal {. The OP has given us no reason to think there is ever a case where helloThereId= is not followed by a number so no reason to complicate the script. If such a case exists the OP should add an input case to show it.

mklement0 Over a year ago

Good to know, thanks. Re complicating the regex: fair point; anyone in true need of the more robust solution can glean it from these comments.

Ed Morton Over a year ago

and thanks for the info that \+ is not POSIX. Until you mentioned that I had assumed the POSIX decree was that you could stick a backslash in front of any ERE metachar to enable its functionality, I had no idea that \+ was GNU only (and yes I did have to check it for myself using GNU sed --posix before believing it!).

Collectives™ on Stack Overflow

Extract number embedded in string

3 Answers 3

3 Comments

2 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related