2

Is there any way to extract css values from website page using css class name. I want to get all css values and child class values using parent class css name.

For an Example:

Wepage Css :

.container {
    width: 80%;
  }
  .btn-wrap {
    padding: 3px;
    width: 25%;
    text-align: center;
  }
  .text-box {
    margin: 0 auto;
    width: 50%;
  }
  .frm-btn-grp {
    padding: 3px;
    width: 100%;
    text-align: center;

  .btn-success {
    border: 1px solid green;
    padding: 7px 24px;
    border-radius: 2px;
    color: white;
    background-color: green;
    width: 100px;
  }
  }

If i will give .frm-btn-grp as input It will return

.frm-btn-grp {
    padding: 3px;
    width: 100%;
    text-align: center;

  .btn-success {
    border: 1px solid green;
    padding: 7px 24px;
    border-radius: 2px;
    color: white;
    background-color: green;
    width: 100px;
   }
  }

Is this possible?

8
  • What have you tried so far? Commented Jan 10, 2020 at 16:07
  • 2
    More curiosity than anything... why are you doing this? Commented Jan 10, 2020 at 16:08
  • @roganjosh Now i want to extract all css values by manually.It will take more times.So i will looking some automation code.Is it possible? Commented Jan 10, 2020 at 16:12
  • That reiterates what you want to do, but I asked why. Commented Jan 10, 2020 at 16:14
  • @Bryan I didn't start anything. I had searched related this but still i have no idea how to start this idea.That's why i posted this question. Commented Jan 10, 2020 at 16:14

1 Answer 1

1

Here's some webscraping action:

import re
import urllib.request as ureq

sample_url = "https://stackoverflow.com/questions/59685137/how-to-extract-css-values-from-website-page"

with ureq.urlopen(sample_url) as req:
    data = req.read().decode('utf-8')

#- Split HTML by line ending; Look for 'text/css' matches
css_lines = [i.strip() for i in data.split('\n') if len(i) > 0 and 'text/css' in i]

#-- Create a simple regular expression to extract the css html
#-- Note: ?P<named_tag> allows for naming each section, but I think
#-- it only works on compiled regular expresions, which isn't a huge
#-- deal.
css_pat = r'href="(?P<css_url>.+)"'
p = re.compile(css_pat)

#-- Create a list and append it with our matches.
css_urls = []
for i in css_lines:
    tmp = p.search(i).group('css_url')
    if tmp:
        css_urls.append(tmp)

Output:

In[4]: css_urls
Out[4]: 
['https://cdn.sstatic.net/Shared/stacks.css?v=d0797a2dd6f2',
 'https://cdn.sstatic.net/Sites/stackoverflow/primary.css?v=f7becef1b212']

Then, you can do whatever. Iterate the urls to get all of the css data, open and join all the css files into one, etc.

with ureq.urlopen(css_urls[0]) as req:
    css_data = req.read().decode('utf-8')

#-- Here's a sample printout of a css file for this page
#-- I added some .replace() statments to make it prettier :-)
print(css_data[:500]
    .replace(',', ',\n')
    .replace('{', ' {\n\t')
    .replace(';', ';\n\t')
    .replace('}','\n\t}\n\n')
    )

Truncated output:

html,
body,
div,
span,
{...}
output,
ruby,
section,
summary,
time,
mark,
audio,
video {
        margin:0;
        padding:0;
        border:0;
        font:inherit;
        font-size:100%;
        vertical-align:baseline
        }

article,
a
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.