4

I am trying to get language code from pages by curl

I wrote below and work...

curl -Ls yahoo.com | grep "lang=" | head -1 | cut -d ' ' -f 3 | cut -d"\"" -f 2

but sometimes code is different like

 curl -Ls stick-it.app | grep "lang=" | head -1 | cut -d ' ' -f 3 | cut -d"\"" -f 2

they wrote like

<html dir="rtl" lang="he-IL">

I just need to get he-IL

If is there any other way, I would appreciate it...

1
  • 3
    Use curl ... | grep -oP 'lang="\K[^"]+' Commented Jun 8, 2021 at 16:31

4 Answers 4

6

Using any sed in any shell on every Unix box:

$ curl -Ls yahoo.com | sed -n 's/^<html.* lang="\([^"]*\).*/\1/p'
en-US
Sign up to request clarification or add additional context in comments.

Comments

2

If you have gnu-grep then using -P (perl regex):

curl -Ls yahoo.com | grep -oP '\slang="\K[^"]+'

he-IL

Comments

2

With awk's match function one could try following too.

your_curl_command | awk '
match($0,/^<html.*lang="[^"]*/){
  val=substr($0,RSTART,RLENGTH)
  sub(/.*lang="/,"",val)
  print val
}
'

Explanation: Adding detailed explanation for above.

your_curl_command | awk '          ##Starting awk program from here.
match($0,/^<html.*lang="[^"]*/){   ##using match function to match regex starting from <html till lang=" till next 1st occurrence of "
  val=substr($0,RSTART,RLENGTH)    ##Creating val which has substring of matched values.
  sub(/.*lang="/,"",val)           ##Substituting everything till lang=" with NULL in val here.
  print val                        ##printing val here.
}
'

Comments

0

Another variation using gnu awk and a pattern with a capture group using match:

match(string, regexp [, array])

curl -Ls yahoo.com | awk 'match($0, /<html [^<>]*lang="([^"]*)"/, a) {print a[1]}'

Output

en-US

The pattern matches

  • <html Match literally
  • [^<>]* Match 0+ any char except < or >
  • lang=" Match literally
  • ([^"]*) Capture group 1 (denoted by a[1] in the example code) matching 0+ times any char except "
  • " Closing double quote

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.