4

it's very simple there is an HTML file and there is a div with variable id like that

<div id="abc_1"><div>

the integer part of the id is variable so it could be abc_892, abc_553 ...etc

what is the best query to get that ?

1

2 Answers 2

6
//div[starts-with(@id, "abc_")]
Sign up to request clarification or add additional context in comments.

Comments

2

The currently accepted answer selects such unwanted elements as:

<div id="abc_xyz"/>

But only such div elements must be accepted, whose id not only starts with "abc_" but the substring following the _ is a representation of an integer.

Use this XPath expression:

//div
   [@id[starts-with(., 'abc_') 
      and 
        floor(substring-after(.,'_')) 
       = 
        number(substring-after(.,'_')) 
       ]
   ]

This selects any div element that has an id attribute whose string value starts with the string "abc_" and the substring after the - is a valid representation of an integer.

Explanation:

Here we are using the fact that in XPath 1.0 this XPath expression:

floor($x) = number($x)

evaluates to true() exactly when $x is an integer.

This can be proven easily:

  1. If $x is an integer the above expression evaluates to true() by definition.

  2. If the above expression evaluates to true(), this means that neither of the two sides of the equality are NaN, because by definition NaN isn't equal to any value (including itself). But then this means that $x is a number (number($x) isnt NaN) and by definition, a number $x that is equal to the integer floor($x) is an integer.

Alternative solution:

//div
   [@id[starts-with(., 'abc_') 
      and 
        'abc_' = translate(., '0123456789', '')
       ]
   ]

5 Comments

could you explain why that works? I'm not very familiar with xpath, so I'm guessing floor() will return a value that is never equal to itself, like sql's ternary logic(eg, null = null in sql is always false)? thanks.
@chris: Done. BTW there was a slight inaccuracy in the expression and this is fixed now.
@chris: You are welcome. Yes, XPath (even 1.0) is a very powerful language and tool for elegant solutions.
Well, as I said I think that additional check is probably unnecessary, but I'm sure it'll probably be useful to some. Just out of curiosity though, would //div[@id[translate(.,'0123456789','') = 'abc_']] not be faster?
@Flynn1179: Both ways are O(N) -- and if one is faster this would depend on implementation. An XPath engine optimizer may or may not recognize and optimize a particular expression. I prefer floor($x) = $x because this is more readable and understandable and translates nicely into "type-checking".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.