4

I have a html output I'm pulling from a RSS feed, it is somethig like this:

<div>
    <p>
        Some text
    </p>
    <iframe src="http://www.source.com"></iframe>
</div>

The problem is that I only need the attr "src" of the iframe tag, Is there a way to get it with PHP? Regex maybe?

Thanks in advance!

4 Answers 4

4

If you're consistently getting just the data you listed above, you could use a simple substring, using the string positions of src=" and "><iframe to specify which substring you want:

$html = '<div><p>Some text</p><iframe src="http://www.source.com"></iframe></div>';

$start = strpos($html, 'src="') + 5;
$length = strpos($html, '"></iframe') - $start;
$src = substr($html, $start, $length);

echo $src;

EDIT - fixed the code and split into multiple lines. This could easily be a one-liner, but - thought it was easier to understand if I broke into multiple lines.

Sign up to request clarification or add additional context in comments.

Comments

3

I'd recommend DOMDocument or SimpleXML.

Something like this might give you the idea.

var_dump(simplexml_load_string($rss_feed));

3 Comments

I think DOMDocument is going to be a little more robust that SimpleXML if the HTML is not perfectly formed. Also, I would guess you have to process the RSS and the HTML it contains separately as the HTML should be encoded into entities for the RSS to be correct.
If you only want the src attribute, you shouldn't need something more robust. IMO, SimpleXML's simple nature is right on in this case.
As I said, it's the HTML being invalid XML I'm concerned about. Have a look at this SO post stackoverflow.com/questions/2890120/php-processing-invalid-xml if you still think it would be easier than just using DOMDocument which auto-corrects bad HTML.
0

I'm not an expert with regex, but a alternative way would be to use explode on the " marks and get array[1] like this:

$rssFeed = '<div>
    <p>
        Some text
    </p>
    <iframe src="http://www.source.com"></iframe>
</div>';

$rssArray = explode('"', $rssFeed);

echo $rssArray[1];

This requires your RSS feed to be very consistent though, if the "Some text" part were to contain " marks, this would mess up and you'd get a wrong string.

You could look through the array for everything starting with http or www to work around errors, but again, it requires a very consistent RSS feed, so you have to judge for you self if this would do the job good enough.

Comments

0

You could parse this output with a little command line perl script. This can be quite robust depending on how general you make the regular expression.

For example,

$command = "echo your_html_output | perl -pe 's/src=\"(.*)\"/$1/'"; # Capture what is in between src=" and the " (the closing quote)

$output = shell_exec("$command");

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.