1

So far I've been trying to get a simple way to stract a title from an HTML page.

This simple:

$url = "http://localhost";

Use any function to extract the title tag using only PHP functions or regular expressions, I do not want to use any external classes such as simple_html_dom or Zend_Dom... I want to do it the simple way with PHP only... can anyone post a sample code to simply extract the title tag from localhost?

I've tried using DOMdocument() class, simple_xml_parse(), and none of them with success

I tried like this:

<?php $dom = new DOMdocument(); 
$dom->loadhtml('pag.html'); 
$items = $dom->getElementsByTagName('title');
foreach ($items as $title) { echo "title"; }
6
  • 1
    What do you mean by "stract"? Commented Apr 26, 2015 at 0:45
  • 1
    There is no way to automatically extract the title from an HTML page. Show us what you tried with DOMdocument and why you didn't have success. Commented Apr 26, 2015 at 0:46
  • 1
    @kojow7 I'm assuming OP meant "extract" Commented Apr 26, 2015 at 0:47
  • I tried like this: <?php $dom = new DOMdocument(); $dom->loadhtml('pag.html'); $items = $dom->getElementsByTagName('title'); foreach ($items as $title) { echo "title"; } And when I said "stract" I meant, parse Commented Apr 26, 2015 at 0:49
  • 1
    Did you try and see if you are actually getting a document back? As in try to echo InnerHTML for example? Commented Apr 26, 2015 at 0:56

1 Answer 1

2

With DOM:

<?php 
$doc = new DOMDocument();
$doc->loadHTML(file_get_contents("1.html"));
$items = $doc->getElementsByTagName("title");
if($items->length > 0){
  echo $items->item(0)->nodeValue;
 }
?>

With Regular Expressions:

<?php

$html = file_get_contents('1.html');
preg_match("/<title>([^<]*)<\/title>/im", $html, $matches);
echo $matches[1];

?>

1.html

<html>
<head>
    <title>This is the title</title>
</head>
<body>
<h1>Hello</h1>
</body>
</html>

Output:

This is the title
Sign up to request clarification or add additional context in comments.

5 Comments

I'm obligated to post this when I read your regex part: stackoverflow.com/a/1732454/811240
@Mike there's a DOM alternative answer.
@Mike OP answer for a regular expressions solution too, I answer with the regex as an alternative to the DOM, that is my first code in the answer
Thank you very much Adrian Cid, this was exactly the kind of answer I was looking for, plain and simple, no extra classes outside "PHP core" required... thank you very much
The m modifier on this pattern doesn't serve any purpose and be safely removed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.