1

Let's say I have this webpage and I'm considering the td element of the table containing the string Doe. Using Google Chrome I can get the CSS Path of that element:

#main > table:nth-child(6) > tbody > tr:nth-child(3) > td:nth-child(3)

Using that as Jsoup CSS Query returns the element I'm considering as you can see here. Is it possible with Jsoup to obtain the above CSS Path from an Element or I have to manually walk the tree to create it?

I know I could use the CSS Query :containsOwn(text) using the own text of the Element, but this could also select other elements, the path instead includes only classes, ids and :nth-child(n).

This would be pretty useful to code a semantic parser in JSoup that will be able to extract similar elements.

3
  • What you are asking is not supported. It would be a nice feature though. Commented Sep 8, 2014 at 16:19
  • 1
    @alkis now jsoup supports it ;) Commented Sep 30, 2014 at 12:34
  • +1. Great job enrico. Commented Sep 30, 2014 at 12:37

1 Answer 1

1

Jsoup doesn't seem to provide such a feature out-of-the-box. So I coded it:

public static String getCssPath(Element el) {
    if (el == null)
        return "";

    if (!el.id().isEmpty())
        return "#" + el.id();

    StringBuilder selector = new StringBuilder(el.tagName());
    String classes = StringUtil.join(el.classNames(), ".");
    if (!classes.isEmpty())
        selector.append('.').append(classes);

    if (el.parent() == null)
        return selector.toString();

    selector.insert(0, " > ");
    if (el.parent().select(selector.toString()).size() > 1)
        selector.append(String.format(
                ":nth-child(%d)", el.elementSiblingIndex() + 1));

    return getCssPath(el.parent()) + selector.toString();
}

I also created an issue and a pull request on the Jsoup repository to extend the Element class with that method. Comment them or subscribe if you want it in Jsoup.

UPDATE

My pull request was merged into jsoup version 1.8.1, now the Element class has the method cssSelector which returns the CSS Path that can be used to retrieve the element in a selector:

Get a CSS selector that will uniquely select this element. If the element has an ID, returns #id; otherwise returns the parent (if any) CSS selector, followed by '>', followed by a unique selector for the element (tag.class.class:nth-child(n)).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.