0

I've used BeautifulSoup to find a specific div class in the page's HTML. I want to check if this div has a span class inside it. If the div has the span class, I want to maintain it on the page's code, but if it doesn't, I want to delete it, maybe using Selenium.

For that I have two lists selecting the elements (div and span). I tried to check if one list is inside the other, and that kind of worked. But how can one delete that found element from the page's source code?

Edit

I've edited the code after a few conversations in the commentaries section. With help, I was able to implement code to remove elements executing javascript.

The code is running with no errors, but nothing is being deleted from the page.

# Import required module
from selenium import webdriver 
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
import time

# Option to launch browser in incognito
options = Options()
options.add_argument("--incognito")
#options.add_argument("--headless")

# Using chrome driver
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)

# Web page url request
driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete%20gr%C3%A1tis%20aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)

driver.execute_script("""
  for(let div of document.querySelectorAll('div._99s5')){
    let match = div.innerText.match(/(\d+) ads? use this creative and text/)
    let numAds = match ? parseInt(match[1]) : 0
    if(numAds < 10){
      div.querySelector(".tp-logo")?.remove()
    }
  }
""")
12
  • What do you mean by delete the element, edit the source code of the html? Commented Mar 10, 2022 at 18:46
  • Yes, that's right. Commented Mar 10, 2022 at 18:57
  • Are you wanting to edit the html locally and save it - knowing that you can't edit source code on a server from the client? Commented Mar 10, 2022 at 18:59
  • I could go with the solution that saves the html locally, on a file, for example. But can't I use Selenium and JavaScript to edit the HTML directly on the browser? Obviously that that change would occur only for who's running the program and only for visualization porpuses. Commented Mar 10, 2022 at 19:04
  • What is your end goal? Commented Mar 10, 2022 at 19:13

2 Answers 2

2

Since you're deleting them in javascript anyway:

driver.execute_script("""
  for(let div of document.querySelectorAll('div._99s5')){
    let match = div.innerText.match(/(\d+) ads? use this creative and text/)
    let numAds = match ? parseInt(match[1]) : 0
    if(numAds < 10){
      div.querySelector(".tp-logo")?.remove()
    }
  }
""")
Sign up to request clarification or add additional context in comments.

10 Comments

I've edited my post, but that was before I saw your answer. I've posted two options I was trying to implement using javascript. I think yours is better. But, the elements weren't deleted from the page. The code ran without errors, but nothing happened on the browser.
Also, I haven't mentioned on my question, so pardon me, but the string I'm looking for, which is "ads use this creative and text", isn't the only thing I'll check before deleting. This string is preceded by a number, like: "15 ads use this creative and text". I have to check if that number is greater than 10, for example. I have to take the whole string and get only the number. I know the class of that element which is a span with a specific class number.
I've wrote this in pseudo-code, could you help me with the javascript part? driver.execute_script(""" for(let div of document.querySelectorAll('div._99s5')){ if(!div.innerText.match("ads use this creative and text")){ div.querySelector(".tp-logo")?.remove() } if((element with span class).replace(/\D/g, "") < 10)){ div.querySelector(".tp-logo")?.remove() } } """)
Check my update
The code is running with no errors, but no element is being deleted from the page. I've added the full code I'm using on the original question. Can you check it to see if I'm missing something?
|
0

Note: Question and comments reads a bit confusing so it would be great to improve it a bit. Assuming you like to decompose() some elements, the reason why or what to do after this action is not clear. So this answer will only point out an apporache.

To decompose() the elements that do not contains ads use this creative and text just negate your selection and iterate the ResultSet:

for e in soup.select('div._99s5:has(:not(:-soup-contains("ads use this creative and text")))'):
    e.decompose()

Now these elements will no longer be included in your soup and you could process it for your needs.

5 Comments

About the reason why or what to do after this action, my end goal is to maintain on the page only the ads that contain "ads use this creative and text". Not all divs with class _99s5 contains this string. This string is also preceded by a number, and I'll check if that number is greater than, let's say, 10, and in that case keep the ad on the page.
About the scrolling, that's is done already.
Okay so decompose() should work for you in both cases to "delete" these elements in your soup !?
With decompose() I could save the HTML locally. I'll probably try to delete the documents in the browser, even if it's temporary. I think @pguardiario response is more what I was looking for.
Under aspect to do all the processing in the "browser" I would agree.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.