1

I wrote a function that parses all headers based on header's tags (h1/2...). Now I want to expand on it and add a feature that parses text based on font-size - say either 20px or 1.5em, regardless of the headers. I want a feature that brings any text written in font-size greater than X, wherever it is on the page. The function takes json file as an input, composed of a random HTML (and whatever website could have, i.e. CSS etc) in it.

Based on crummy it seems like one possible option is to use soup.fetch(), however, I haven't found many examples using it for this purpose.

Since font-size well might appear under CSS component I'm not sure that bs4 is the right package for it. I assume the answer includes cssutils or tinycss but haven't been able to find the best way to use those for this task.

As a reference - My code for header's tags was posted for a review: https://codereview.stackexchange.com/questions/166671/extract-html-content-based-on-tags-specifically-headers/166674?noredirect=1#comment317280_166674.

Posts I've checked: What is the pythonic way to implement a css parser/replacer ;
Find all the span styles with font size larger than the most common one via beautiful soup python ;
Search in HTML page using Regex patterns with python ;
How to parse a web page containing CSS and HTML using python ;
how to extract text within font tag using beautifulsoup ;
Extract text with bold content from css selector

Thanks much,

2
  • How does it parse the text based on the font size? Do you mean that you know which header tag has what font size? Commented Jun 27, 2017 at 11:32
  • Thanks for the comment @MoonCheesez . I mean regardless of the headers, I want a feature that brings any text written in font-size greater than X. I'll edit for clarity - thanks. Commented Jun 27, 2017 at 11:49

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.