Selenium: How to handle invalid CSS selectors in DOM

Question

I'm scraping a website with Selenium / Python3, the website only uses invalid selectors like:

<input id="egg:bacon:SPAM" type="text"/>
<input id="egg:sausages:SPAM:SPAM" type="text"/>

(invalid parts are egg:bacon:SPAM & egg:sausages:SPAM:SPAM)

I did try to select these tags with:

driver.find_element_by_css_selector('input#egg:bacon:SPAM')

But of course I get selenium.common.exceptions.InvalidSelectorException

I also did try using xpath to get my tags, it works with:

driver.find_element_by_xpath('//input[@id="egg:bacon:SPAM"]')

But my code is based on a home made library based on CSS selectors. Adding XPATH support would require to add ~200 lines of code (without counting unit tests, documentation, etc..) only to handle this wrong and not generic behavior.

Plus, scraping this website is part of a bigger project where only this specific website use that kind of CSS selectors, pushing that much effort for a single website on 10 makes me uncomfortable.

I could use something like find_element_by_css_selector('.foo > input:nth-child(2)') but it's pretty tricky and any small update on the DOM could break the scraper.

Is there any clean way to handle non valid css selectors via Selenium using find_element_by_css_selector or am I doomed to use XPATH for this website?

Sers · Accepted Answer · 2020-02-21 10:41:04Z

2

They all valid. You need to escape special characters or use quotes:

driver.find_element_by_css_selector('input[id="egg:bacon:SPAM"]')
driver.find_element_by_css_selector('input#egg\:bacon\:SPAM')

answered Feb 21, 2020 at 10:41

Sers

12.3k2 gold badges14 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

undetected Selenium · Accepted Answer · 2020-02-21 10:56:12Z

1

To identify an element with id attribute containing reserved characters, e.g. egg:bacon:SPAM, egg:sausages:SPAM:SPAM you can use dynamic css-selectors with the following wildcards :

^ : To indicate an attribute value starts with
* : To indicate an attribute value contains
$ : To indicate an attribute value ends with

Solution

You can use the following solutions:

To identify the element <input id="egg:bacon:SPAM" type="text"/>:

driver.find_element_by_css_selector("input[id^='egg'][id*='bacon'][id$='SPAM']")

To identify the element <input id="egg:sausages:SPAM:SPAM" type="text"/>:

driver.find_element_by_css_selector("input[id^='egg'][id*='sausages'][id$='SPAM']")

Reference

You can find a couple of relevant discussions in:

edited Feb 21, 2020 at 10:56

answered Feb 21, 2020 at 10:35

undetected Selenium

194k44 gold badges304 silver badges387 bronze badges

6 Comments

Arount Over a year ago

Super nice, it works. But I have few inputs like egg:bacon:SPAM & egg:bacon:SPAM:SPAM on the same page. As I understand your anwser it uses a kind of regex expression (^, *, $) and I fear the example I gave in this comment would not be supported with this method. Also do you have a doc or keyword so I can find doc about this? (+1 anyway)

undetected Selenium Over a year ago

@Arount ^, * and $ aren't regex expression as such :) but wildcards used with cssSelectors. Checkout the updated answer and let me know the status.

Arount Over a year ago

Thanks, very nice to know and super hepful. I will still validate Sers' anwser because it's less verbose (and a replace(':', '\\:') at the right place do the job) but I keep my upvote because it's very good answer (and yeah, wildcards.. ooops :D)

Arount Over a year ago

Just for record, I just had a situation where I had to use your wildcards, epic.

undetected Selenium Over a year ago

@Arount This answer is based on best practices which you have to adapt in the longer run.

|

Collectives™ on Stack Overflow

Selenium: How to handle invalid CSS selectors in DOM

2 Answers 2

Comments

Solution

Reference

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Solution

Reference

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related