0
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("url_goes_here")

p_id = driver.find_elements_by_tag_name("script")

This procures me the script I need. I don't need to execute it, as it's already executed and running upon initial page load. It contains a variable named "task". How do I access its value with Selenium?

0

4 Answers 4

1

The regex module re can help you with that:

import re
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("url_goes_here")

p_id = driver.find_elements_by_tag_name("script")

for script in p_id:
    innerHTML=script.get_property('innerHTML')
    task=re.search('var task = (.*);',innerHTML)
    if task is not None:
        print(task.group(1))

What this does is look through the innerHTML of each script and, from the defined search pattern ('var task = (.*);'), capture the matching string group ((.*)). Print out the group if a match is found.

Sign up to request clarification or add additional context in comments.

3 Comments

First off, you're the only person in this thread so far who seems to have gotten what I'm actually asking, namely to extract the inner contents of the appropriate <script> tag which I've already scraped, and go from there. So thanks! Now, as for this: script.get_property('innerHTML') It returns an empty string! :(
Strange..if you do a .get_attribute instead of .get_property do you still get an empty string?
I never got around to trying .get_attribute, but I eventually solved my problem by extracting the appropriate element via the .find_elements_by_xpath rather than via by tag name as depicted in the original question and THEN using your .get_property('innerHTML') suggestion. My issue is solved now, and since you were the only one who at least put me on the right path to its solution, I will mark your answer as correct! Thank you!!! :)
0

you can access value of tag or any element of html via .text or .getText()

1 Comment

I was aware of these methods, but the first one returns an empty string to me, no matter what I try it on. Whereas the second gives: AttributeError: 'WebElement' object has no attribute 'getText'
0

Since you are using find_elements_by_tag_name() which returns list of elements. Iterate that list and check element.text contains task then print text of that element.

p_id = driver.find_elements_by_tag_name("script")
for id in p_id:
    if 'task' in id.text:
        print(id.text)

4 Comments

Since you are using find_elements_by_tag_name() which returns list of elements.. Any better suggestion for scraping for scripts specifically? Would searching by... let's say, XPath, produce a better formatted output or something? Second, this is the actual content of an element of the output list that gets produced by the execution of the find_elements_by_tag_name(): <selenium.webdriver.remote.webelement.WebElement (session="53d2976784532dd4717abff68170b22a", element="8d9f432d-7a2e-47e0-8023-d6c092ee9620")> As you can see, it's not very legible.
@Alexander : I am extremely sorry if haven't understood your requirement.I believe you are searching script tag which contains the text task is that right?
the script tag is already FOUND and stored in a list. Question is, how do I extract the value of its variable "task" from that raw data stored in the list.
Well for that you need to post that script details.Regular expression will definitely work to extract the value.
0

#Use Xpath instead:

from selenium import webdriver

driver = webdriver.Chrome()

driver.get("url_goes_here")

p_id = driver.find_element(By.XPATH,"ADDXPATH")

p_id.get_attribute('outerHTML')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.