0

The html snippet in a url (www.foo.com/index.html):

...
<th class="name" align="left" scope="col">
<a class="foo" href="foo.html">foo</a>
</th>
...
<th class="name" align="left" scope="col">
<a class="bar" href="bar.html">bar</a>
</th>
...
<th class="name" align="left" scope="col">
<a class="ba" href="baz.html">baz</a>
</th>
......

I would like to get, through php all the text inside the class .name and convert it to JSON

So that it ends up like:

{"names":["foo","bar","baz"]}

This is what I have tried:

function linkExtractor($html){
    $nameArr = array();
    $doc = new DOMDocument();
    $doc->loadHTML($html);
    $names = //how do i get the elements?
    foreach($names as $name) {
        array_push($nameArr, $name);
    }
    return $imageArr;
}

echo json_encode(array("names" => linkExtractor($html)));
5
  • why don't you try jquery ? Commented May 8, 2014 at 12:44
  • @Dwza Won'T work since the html is not being executed... Commented May 8, 2014 at 12:45
  • You normally do that with xpath. Please use the search before asking a question. Commented May 8, 2014 at 12:45
  • 1
    @hakre how in any way is that a duplicate? Commented May 8, 2014 at 12:49
  • @Maximilian: Exactly for this: //how do i get the elements? in your question. Commented May 8, 2014 at 12:51

2 Answers 2

2

try this ...

$html = "http://www.foo.com/index.html"; //is this right?
function linkExtractor($html, $classname){
    $nameArr = array();
    $doc = new DOMDocument();
    $doc->loadHTML($html);

    $names = $doc->xpath("//*[@class='" . $classname . "']");

    foreach($names as $name) {
        array_push($nameArr, $name);
    }
    return $imageArr;
}

echo json_encode(array("names" => linkExtractor($html,".name")));
Sign up to request clarification or add additional context in comments.

15 Comments

and before you try this, rest assured that it won't work.
I am getting the error Missing argument 2 for linkExtractor(),
use edited version of answer ...
@Maximilian: That error only prevented you from getting the next fatal error. See the linked duplicate on how to actually run that xpath query.
Why does this not work? it seems like it should?
|
0

So just this has an end:

$names = function($html) {
    $doc  = new DOMDocument();
    $last = libxml_use_internal_errors(TRUE);
    $doc->loadHTML($html);
    libxml_use_internal_errors($last);
    $xp     = new DOMXPath($doc);
    $result = array();
    foreach ($xp->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' name ')]") as $node)
        $result[trim($node->textContent)] = 1;
    return array_keys($result);
};

echo json_encode(array("names" => $names($html)));

Output:

{"names":["foo","bar","baz"]}

Required PHP version: 5.3+

3 Comments

this returns nothing.
like this {"names":[]}
If you see that output, it means that it generally works, however the HTML is not as you wrote in your question. As you can see it perfectly works: 3v4l.org/3TUPb - So if you provide HTML that does not contain such (e.g. by beign plain invalid so DOM refuses to load), fix the HTML first. You're probably just having some HTML problem, totally unrelated to traverse the nodes.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.