Given something like the following html:
<div>
<div>
<meta ... />
<img />
</div>
<div id="main">
<p class="foo">Hello, World</p>
<div>
<div class="bar">Hey, there!</div>
</div>
</div>
</div>
How would I go about selecting only the elements that have text and outputting a generated, unique css selector for said element?
For this example, that would be:
# can be even more specific if there are other .foo's
------
[ |
{ "html": "Hello, World", "selector": ".foo"},
{ "html": "Hey, there!", "selector": ".bar" }
]
Was playing with BeautifulSoup and html_sanitizer but wasn't getting great results.