Need to grep a specific string using curl

Question

I am trying to get language code from pages by curl

I wrote below and work...

curl -Ls yahoo.com | grep "lang=" | head -1 | cut -d ' ' -f 3 | cut -d"\"" -f 2

but sometimes code is different like

 curl -Ls stick-it.app | grep "lang=" | head -1 | cut -d ' ' -f 3 | cut -d"\"" -f 2

they wrote like

<html dir="rtl" lang="he-IL">

I just need to get he-IL

If is there any other way, I would appreciate it...

Use curl ... | grep -oP 'lang="\K[^"]+'

anubhava
– anubhava

2021-06-08 16:31:59 +00:00
Commented Jun 8, 2021 at 16:31 — anubhava
– anubhava, Commented Jun 8, 2021 at 16:31

Ed Morton · Accepted Answer · 2021-06-08 17:25:05Z

6

Using any sed in any shell on every Unix box:

$ curl -Ls yahoo.com | sed -n 's/^<html.* lang="\([^"]*\).*/\1/p'
en-US

edited Jun 8, 2021 at 17:25

answered Jun 8, 2021 at 17:18

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

anubhava · Accepted Answer · 2021-06-08 17:53:11Z

2

If you have gnu-grep then using -P (perl regex):

curl -Ls yahoo.com | grep -oP '\slang="\K[^"]+'

he-IL

answered Jun 8, 2021 at 17:53

anubhava

790k67 gold badges603 silver badges671 bronze badges

Comments

RavinderSingh13 · Accepted Answer · 2021-06-08 18:17:36Z

2

With awk's match function one could try following too.

your_curl_command | awk '
match($0,/^<html.*lang="[^"]*/){
  val=substr($0,RSTART,RLENGTH)
  sub(/.*lang="/,"",val)
  print val
}
'

Explanation: Adding detailed explanation for above.

your_curl_command | awk '          ##Starting awk program from here.
match($0,/^<html.*lang="[^"]*/){   ##using match function to match regex starting from <html till lang=" till next 1st occurrence of "
  val=substr($0,RSTART,RLENGTH)    ##Creating val which has substring of matched values.
  sub(/.*lang="/,"",val)           ##Substituting everything till lang=" with NULL in val here.
  print val                        ##printing val here.
}
'

edited Jun 8, 2021 at 18:17

answered Jun 8, 2021 at 17:55

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

Comments

The fourth bird · Accepted Answer · 2021-06-08 20:34:37Z

0

Another variation using gnu awk and a pattern with a capture group using match:

match(string, regexp [, array])

curl -Ls yahoo.com | awk 'match($0, /<html [^<>]*lang="([^"]*)"/, a) {print a[1]}'

Output

en-US

The pattern matches

<html Match literally
[^<>]* Match 0+ any char except < or >
lang=" Match literally
([^"]*) Capture group 1 (denoted by a[1] in the example code) matching 0+ times any char except "
" Closing double quote

edited Jun 8, 2021 at 20:34

answered Jun 8, 2021 at 20:23

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Collectives™ on Stack Overflow

Need to grep a specific string using curl

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related