1

Given something like the following html:

<div>
     <div>
         <meta ... />
         <img />
     </div>
     <div id="main">
        <p class="foo">Hello, World</p>
        <div>
           <div class="bar">Hey, there!</div>
        </div>
     </div>
</div>

How would I go about selecting only the elements that have text and outputting a generated, unique css selector for said element?

For this example, that would be:

 # can be even more specific if there are other .foo's
                                        ------
[                                          |
  { "html": "Hello, World", "selector": ".foo"},
  { "html": "Hey, there!", "selector": ".bar" }
]

Was playing with BeautifulSoup and html_sanitizer but wasn't getting great results.

1 Answer 1

1

This should be a piece of cake with BeautifulSoup

from bs4 import BeautifulSoup

html = """
<div>
     <div>
         <meta ... />
         <img />
     </div>
     <div id="main">
        <p class="foo">Hello, World</p>
        <div>
           <div class="bar">Hey, there!</div>
        </div>
     </div>
</div>
"""

soup = BeautifulSoup(html, 'html.parser')

results = []

for element in soup.find_all(string=True):
    parent = element.parent
    while parent and not (parent.has_attr('id') or parent.has_attr('class')):
        parent = parent.parent

    if parent and element.strip() != '':
        if parent.has_attr('id'):
            results.append({
                "html": element.strip(),
                "selector": '#' + parent['id']
            })
        elif parent.has_attr('class'):
            results.append({
                "html": element.strip(),
                "selector": list(map(lambda cls: '.' + cls, parent['class']))
            })

print(results)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.