1

From the below HTML I am looking to extract the link in 'data-url' by using CSS selectors only(No methods).

<a class="btn" data-url="https://example.com">

BTW, I am trying to scrape a website using a _scraper tool called webscraper.io, where the data to be extracted has to be specified using a CSS selector. Hence, I cannot make use of any useful methods from other programming languages.

4
  • css selectors match they don't extract. Commented Aug 29, 2019 at 7:11
  • @QHarr: Some web scraping tools include special, non-standard selectors (in the CSS selector syntax) that will actually do the extracting for you. webscraper.io seems to have a completely different definition of a "selector" in addition to the traditional "CSS selector" though and it looks like the asker is going to need both, given the way this tool is designed. Commented Sep 2, 2019 at 3:56
  • @QHarr: But this question is exceptional - in the vast majority of web scraping questions, that distinction doesn't actually matter. Any time someone asks to "extract [...] using selectors/XPath" it's pretty much implied that they want to 1) match elements using selectors, and then 2) extract data from whatever gets matched. The distinction becomes important once someone says they can only pass in a selector/XPath string, and even then they're probably already aware of the distinction. Commented Sep 2, 2019 at 3:58
  • @BoltClock I sit corrected :-) Definitely worth knowing, thanks. Commented Sep 2, 2019 at 5:07

2 Answers 2

1

You can extract the url from the tag by using Element Attribute & then specifying the attribute name data-url in the attribute name field.

You can refer to the following sitemap for your reference.

{"_id":"stack-sample","startUrl":["http://elitesolution.co.in/sample/inde.html"],"selectors":[{"id":"a parent","type":"SelectorElementAttribute","parentSelectors":["_root"],"selector":"a","multiple":false,"extractAttribute":"data-url","delay":0}]}
Sign up to request clarification or add additional context in comments.

Comments

1

The only thing I can think of is this:

.btn::after {
  content: ' ' attr(data-url);
}

Working Example:

  .btn::after {
content: ' ' attr(data-url);
  }
<a class="btn" data-url="https://example.com">URL:</a>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.