0

I am trying to extract all those tags whose class name fits the regex pattern frag-0-0, frag-1-0, etc. from this link

I am trying to retrieve it using the following code

    driver = webdriver.Chrome(chromedriver)
    for frg in frgs:
        driver.get(URL + frg[1:])
        frags=driver.find_elements_by_id(re.compile('frag-[0-9]-0'))
    for frag in frags:
            for tag in frag.find_elements_by_css_selector('[class^=fragmark]'):
                lst.append([tag.get_attribute('class'), tag.text])
    driver.quit()
    return lst

But I get an error. What is the right way of doing this?

The error is as follows:

Traceback (most recent call last):
  File "vroni.py", line 119, in <module>
    op('Aaf')
  File "vroni.py", line 104, in op
    plags=getplags(cd)
  File "vroni.py", line 95, in getplags
    frags=driver.find_elements_by_id(re.compile('frag-[0-9]-0'))
  File "/home/eadaradhiraj/Documents/webscrape/venv/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 281, in find_elements_by_id
    return self.find_elements(by=By.ID, value=id_)
  File "/home/eadaradhiraj/Documents/webscrape/venv/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 778, in find_elements
    'value': value})['value']
  File "/home/eadaradhiraj/Documents/webscrape/venv/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 234, in execute
    response = self.command_executor.execute(driver_command, params)
  File "/home/eadaradhiraj/Documents/webscrape/venv/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 398, in execute
    data = utils.dump_json(params)
  File "/home/eadaradhiraj/Documents/webscrape/venv/local/lib/python2.7/site-packages/selenium/webdriver/remote/utils.py", line 34, in dump_json
    return json.dumps(json_struct)
  File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python2.7/json/encoder.py", line 184, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <_sre.SRE_Pattern object at 0xb668b1b0> is not JSON serializable
2
  • 2
    Please post the error Commented Jul 5, 2016 at 20:40
  • Please check the update Commented Jul 5, 2016 at 20:49

2 Answers 2

1

The Selenium find_elements_by_id method expects a simple string but the output of re.compile is a regular expression object which can be used for matching using its match() and search() methods, described below:

reobject = re.compile(pattern)
result = reobject.match(string)

Generally I would advise against using regular expressions for elements location. There must be another way to find this element. Perhaps class name, css or even XPath.

Sign up to request clarification or add additional context in comments.

4 Comments

@EchchamaNayak that looks like a completely different question and problem now.
So should I ask about this separately?
@EchchamaNayak In this question your problem was that you used regular expression object instead of a string. This was solved I believe. Now you are asking how to get certain elements using other method and what's more important using another driver. So I would say, yes.
OK Sure. Thanks again
1

The function find_elements_by_id takes a string as an object, not a regular expression. I'm not sure if the function you're using can take regex, even as a string.

You might want to try XPath.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.