3

I have a set of html items which are to be parsed. I need to parse the contents of a div whose class name ends with 'uid-g-uid'. Below are the sample divs...

<div class="uid-g-uid">1121</div>

<div class="yskisghuid-g-uid">14234</div>

<div class="kif893jduid-g-uid">114235</div>

I have tried the below combinations but didnt work

$doc = new DOMDocument();
$bdy = 'HTML Content goes here...';
@$doc->loadHTML($bdy);
$xpath = new DomXpath($doc);
$div = $xpath->query('//*[@class=ends-with(., "uid-g-uid")]');

and also tried

$doc = new DOMDocument();
$bdy = 'HTML Content goes here...';
@$doc->loadHTML($bdy);
$xpath = new DomXpath($doc);
$div = $xpath->query('//*[@class="*uid-g-uid"]');

Please help!

1
  • 1
    Try //*[ends-with(@class,'uid-g-uid')] Commented Apr 9, 2013 at 12:18

4 Answers 4

3

ends-with() requires Xpath 2.0 so it won't work with DOMXPath which is Xpath 1.0. Something like this should work though:

$xpath->query('//*["uid-g-uid" = substring(@class, string-length(@class) - 8)]');
Sign up to request clarification or add additional context in comments.

2 Comments

here is the sample div that it didnt parse correctly... <div id="yiv744740354uid-g-uid" style="">2201</div>
there is no class attribute in that div
2

You want to do an XPath 1.0 query that checks for a string that ends with a certain string. The ends-with() string function is not available in that version.

I can see multiple ways to do this. As in your case the substring always is in there only once and if then at the end you can just use contains():

//*[contains(@class, "uid-g-uid")]

If the substring could be also at some other place in there and you dislike it, then check if it is at the end:

//*[contains(@class, "uid-g-uid") and substring-after(@class, "uid-g-uid") = ""]

If it could be even in there multiple times, then this won't work neither. In that case you can just check if the string ends wiht it:

//@class[substring(., string-length(.) - 8, 9) = "uid-g-uid"]/..

Which is probably the most straight-forward variant even, or, as the third argument of substring() is optional to compare until the end:

//@class[substring(., string-length(.) - 8) = "uid-g-uid"]/..

1 Comment

This is the actually quality answer.
2

Since you're looking for a XPath function that is not available in XPath 1.0, I think you can go with DOMXPath::registerPhpFunctions feature provided by PHP to call any PHP function for your XPath query. With that you can even call preg_match function like this:

$html = <<< EOF
<div class="uid-g-uid">1121</div>
<div class="yskisghuid-g-uid">14234</div>
<div class="kif893jduid-g-uid">114235</div>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);

// Register the php: namespace (required)
$xpath->registerNamespace("php", "http://php.net/xpath");

// Register PHP preg_match function
$xpath->registerPHPFunctions('preg_match');

// call PHP preg_match function on your xpath to make sure class ends
// with the string "uid-g-uid" using regex "/uid-g-uid$/"
$nlist = $xpath->evaluate('//div[php:functionString("preg_match",
                           "/uid-g-uid$/", @class) = 1]/text()');

$numnodes = $nlist->length; // no of divs matched
for($i=0; $i < $numnodes; $i++) { // run the loop on matched divs
   $node = $nlist->item($i);
   echo "val: " . $node->nodeValue . "\n";
}

Comments

1

try this:

#/ First regex and replace your class with findable flag
$bdy = preg_replace('/class=\".*?uid-g-uid\"/ims', 'class="__FINDME__"', $bdy);

#/ Now find the new flag name instead
$dom = new DOMDocument();
@$dom->loadHTML($bdy);
$xpath = new DOMXPath($dom);

$divs = $xpath->evaluate("//div[@class = '__FINDME__']");
var_dump($divs->length); die(); //check if length is >=1. else we have issue.

for($j=0; $j<$divs->length; $j++)
{
    $div = $divs->item($j);
    $div_value = $div->nodeValue;
    .  
    .  
    .  
}

7 Comments

I am getting an error DOMXPath::evaluate(): xmlXPathCompOpEval: function ends-with not found
This is the basic scheme how you will exactly get it. You just need to now get the right class search in it.
Check the updated version. I have done it with a small regex trick
now who would have __FINDME__ as the class !! and if they do, change it to ____FINDME____ (or something more complex) !
@RaheelHasan That is very true! and infact, FINDME itself is complex and could not be a class at any point of time and a good HTML programmer would not make a mistake of using such classes anyways!!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.