0

I have been trying to extract posts from a forum found at this URL: https://www.thestudentroom.co.uk/showthread.php?t=7263973

The body of the text I am trying to extract is under:

<div class="post-content">

Yet I keep getting the following error whether I use get element to search by XPATH or CLASS_NAME:

NoSuchElementException

I have tried the following as well as looking at several of the similar posts on SO but can't find a solution that works for me, any help would be appreciated

options = Options()
options.add_argument("--headless")
options.headless = True

def get_posts(url):
    driver = webdriver.Chrome(options = options) 
    WebDriverWait(driver, 5)
    driver.get(url)                                                                            
#   posts = driver.find_element(By.XPATH, '/html/body/div[2]/div/div[6]/div[1]/div[1]/div[6]/div[3]/div[2]/div[2]').text 
    posts = driver.find_element(By.CLASS_NAME, 'post-content');
    return posts

SR_posts = get_posts(url = "https://www.thestudentroom.co.uk/showthread.php?t=7263973")
SR_posts

Edit: added picture to the HTML class 'post-content' that contains the text HTML of webpage

Edit 2: Second picture of inspect element Inspect element of text body

13
  • If I open that webpage there is no post-content into any class. Well, there is no post-content anywhere Commented Dec 19, 2022 at 14:50
  • Hi @JakyRuby I have added a screenshot of the inspect element to hopefully show what I mean Commented Dec 19, 2022 at 15:03
  • Maybe the element isn't immediately available. Use selenium to wait for its visibility Commented Dec 19, 2022 at 15:04
  • I also could not find element with class name post-content on that page. Not even after scrolling the page down Commented Dec 19, 2022 at 15:06
  • @Prophet On the second picture I've included I inspected element on the text and that brought me to the div class="post-content" Commented Dec 19, 2022 at 15:10

1 Answer 1

0

Try this to get the value of the post:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium import webdriver


def get_posts(url):
    driver = webdriver.Chrome()
    driver.maximize_window()
    wait = WebDriverWait(driver, 5)
    driver.get(url)
    post = wait.until(EC.presence_of_element_located((By.XPATH, f"//div[@class='styles__PostContent-sc-1r7c0ap-3 kylDhV']/span")))
    return post

SR_post = get_posts(url = "https://www.thestudentroom.co.uk/showthread.php?t=7263973")
print(SR_post.text)

Advices:

  • Wait always for the element you want to interact for
  • Use EC (Expected conditions) to verify your element is accessible in the way you need (In my example is EC.presence_of_element_located but you could want something else like wait to be clickable etc.

I used those 2 advices in the code.

Sign up to request clarification or add additional context in comments.

2 Comments

That's great, thank you so much for your help. If it's not too much trouble, is it possible to return all of the posts on that forum link? Currently the code is just returning the first post
You are asking for the replys, not the post. And that is a totally different topic. Now you can get the post text, from now you have to investiigate how to get the replies, with the code of my answer you should be able to continue

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.