1

What would be the simplest but reliable way to parse the src attribute of the first <img> tag found in an arbitrary text string, but without using any external libraries? That means to get everything that is between opening and closing " character of <img> tag's src atrribute.


I did this script, but it is not a reliable solution in some cases:

  $string = $item['description'];
  $arr = explode('img', $string);
  $arr = explode('src', $arr[1]);
  $arr = explode('=', $arr[1]);
  $arr = explode('>', $arr[1]);

  $pos1 = strpos($arr[0], '"')+1;
  $pos2 = strrpos($arr[0], '"')-1;

  if (!$pos1) {
    $pos1 = strpos($arr[0], "'")+1;
    $pos2 = strrpos($arr[0], "'")-1;
  }

  if ($pos1 && $pos2) { 
    $result = substr($arr[0], $pos1, $pos2); 
  }
  else { $result = null; }
4
  • 1
    DOMDocument is not an external library, why not use it? Use getElementsByTagName(), grab the first item, and get the src with $img->getAttribute('src'). Commented Nov 24, 2016 at 10:44
  • It is possible that DOMDocument will be unavailable at some servers? Commented Nov 24, 2016 at 10:46
  • DOMDocument is a PHP built-in class. Commented Nov 24, 2016 at 10:48
  • Ok, will try, thanks Commented Nov 24, 2016 at 10:49

4 Answers 4

2

If You want to get the values of all attributes of img tag, You need to make 2 regular expressions.

1. Get content of an img tag:

/<\s*img([^<>]+)>/
  1. Then use this regex on the captured content with function preg_match_all()

    /\S+\s*=\s*[\'\"]([^\"\']+)[\'\"]/g
Sign up to request clarification or add additional context in comments.

Comments

2

Here is your answer: First, you need to make call to this regex,

<img(.*?)>

Then, in order to get other attributes, you need to make another regex call to the previous result

"(.*?)"

1 Comment

And if there are other attributes in the tag?
1

Try This,

<img\s+src\s?\=\s?\"(https?\:\/\/[\w\.\/]+)\".*\/>

Comments

0

The only safest way is by using DOMDocument built-in (in PHP 5) class. Use getElementsByTagName(), check if the length is more than 0, and grab the first item src value with getAttribute('src'):

$html = "YOUR_HTML_STRING";
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$imgs = $dom->getElementsByTagName('img');
if ($imgs->length > 0) {
    echo $imgs->item(0)->getAttribute('src');
}

See this PHP demo

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.