0

So I run a curl command and grep for a keyword.

Here is the (sanitized) result:

...Dir');">Town / Village</a></th><th><a href="javascript:SetFilter(3,'ListPublicASDF','ASDFDir');">Phone Number</a></th></tr><tr class="rowodd"><td><a href="javascript:calldialog('ASDF','&Mode=view&helloThereId=42',600,800);"...

I want to get the number 42 - a command line one-liner would be great.

  • search for the string helloThereId=
  • extract the number right beside it (42 in the above case)

Does anyone have any tips for this? Maybe some regex for numbers? I'm afraid I don't have enough experience to construct an elegant solution.

3 Answers 3

4

You could use grep with -P (Perl-Regexp) parameter enabled.

$ grep -oP 'helloThereId=\K\d+' file
42
$ grep -oP '(?<=helloThereId=)\d+' file
42

\K here actually does the job of positive lookbehind. \K keeps the text matched so far out of the overall regex match.

References:

Sign up to request clarification or add additional context in comments.

3 Comments

Very nice! It's highlighting 42, but it's still printing the big long string?
did you enable the -o parameter?
Actually, it works now. I was using "-P". It works perfectly with "-oP".
2

If your grep version supports -P, (as is true for the OP, given that they're on Linux, which comes with GNU grep), Avinash Raj's answer is the way to go.

For the potential benefit of future readers, here are alternatives:

If your grep doesn't support -P, but does support -o, here's a pragmatic solution that simply extracts the number from the overall match in a 2nd step, by splitting the input into fields by =, using cut:

grep -Eo 'helloThereId=[0-9]+' in | cut -d= -f2 file

Finally, if your grep supports neither -P nor -o, use sed: Here's a POSIX-compliant alternative, using sed with a basic regular expression (hence the need to emulate + with \{1,\} and to escape the parentheses):

sed -n 's/.*helloThereId=\([0-9]\{1,\}\).*/\1/p' file

2 Comments

You could also use [0-9][0-9]* for the same effect without adding messy backslashes
@JID: Thanks; that's definitely an option in this simple case, but the \{1,\} is handy in more complex cases, and it needs better PR :) (GNU sed implements \+, which leads people to believe that it's portable, but it's not.)
1

This will work with any sed on any UNIX OS, even the pre-POSIX default sed on Solaris:

$ sed -n 's/.*helloThereId=\([0-9]*\).*/\1/p' file
42

4 Comments

So the pre-POSIX default sed on Solaris doesn't support \{1,\}? If not, @JID's suggestion should work: using [0-9][0-9]* to emulate +. (Your solution would match the empty string if helloThereId= is not followed by digits, and output \n.)
Correct, putting a backslash before ERE metacharacters to enable their functionality in tools that support BREs was not part of the BRE language until fairly recently. If you stick a backslash before a { in Solaris sed it's still just a literal {. The OP has given us no reason to think there is ever a case where helloThereId= is not followed by a number so no reason to complicate the script. If such a case exists the OP should add an input case to show it.
Good to know, thanks. Re complicating the regex: fair point; anyone in true need of the more robust solution can glean it from these comments.
and thanks for the info that \+ is not POSIX. Until you mentioned that I had assumed the POSIX decree was that you could stick a backslash in front of any ERE metachar to enable its functionality, I had no idea that \+ was GNU only (and yes I did have to check it for myself using GNU sed --posix before believing it!).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.