How to extract css values from website page

Question

Is there any way to extract css values from website page using css class name. I want to get all css values and child class values using parent class css name.

For an Example:

Wepage Css :

.container {
    width: 80%;
  }
  .btn-wrap {
    padding: 3px;
    width: 25%;
    text-align: center;
  }
  .text-box {
    margin: 0 auto;
    width: 50%;
  }
  .frm-btn-grp {
    padding: 3px;
    width: 100%;
    text-align: center;

  .btn-success {
    border: 1px solid green;
    padding: 7px 24px;
    border-radius: 2px;
    color: white;
    background-color: green;
    width: 100px;
  }
  }

If i will give .frm-btn-grp as input It will return

.frm-btn-grp {
    padding: 3px;
    width: 100%;
    text-align: center;

  .btn-success {
    border: 1px solid green;
    padding: 7px 24px;
    border-radius: 2px;
    color: white;
    background-color: green;
    width: 100px;
   }
  }

Is this possible?

@roganjosh Now i want to extract all css values by manually.It will take more times.So i will looking some automation code.Is it possible? — PrakashT
– PrakashT, Commented Jan 10, 2020 at 16:12
@Bryan I didn't start anything. I had searched related this but still i have no idea how to start this idea.That's why i posted this question. — PrakashT
– PrakashT, Commented Jan 10, 2020 at 16:14

Mark M · Accepted Answer · 2020-01-11 13:24:14Z

Here's some webscraping action:

import re
import urllib.request as ureq

sample_url = "https://stackoverflow.com/questions/59685137/how-to-extract-css-values-from-website-page"

with ureq.urlopen(sample_url) as req:
    data = req.read().decode('utf-8')

#- Split HTML by line ending; Look for 'text/css' matches
css_lines = [i.strip() for i in data.split('\n') if len(i) > 0 and 'text/css' in i]

#-- Create a simple regular expression to extract the css html
#-- Note: ?P<named_tag> allows for naming each section, but I think
#-- it only works on compiled regular expresions, which isn't a huge
#-- deal.
css_pat = r'href="(?P<css_url>.+)"'
p = re.compile(css_pat)

#-- Create a list and append it with our matches.
css_urls = []
for i in css_lines:
    tmp = p.search(i).group('css_url')
    if tmp:
        css_urls.append(tmp)

Output:

In[4]: css_urls
Out[4]: 
['https://cdn.sstatic.net/Shared/stacks.css?v=d0797a2dd6f2',
 'https://cdn.sstatic.net/Sites/stackoverflow/primary.css?v=f7becef1b212']

Then, you can do whatever. Iterate the urls to get all of the css data, open and join all the css files into one, etc.

with ureq.urlopen(css_urls[0]) as req:
    css_data = req.read().decode('utf-8')

#-- Here's a sample printout of a css file for this page
#-- I added some .replace() statments to make it prettier :-)
print(css_data[:500]
    .replace(',', ',\n')
    .replace('{', ' {\n\t')
    .replace(';', ';\n\t')
    .replace('}','\n\t}\n\n')
    )

Truncated output:

html,
body,
div,
span,
{...}
output,
ruby,
section,
summary,
time,
mark,
audio,
video {
        margin:0;
        padding:0;
        border:0;
        font:inherit;
        font-size:100%;
        vertical-align:baseline
        }

article,
a

Collectives™ on Stack Overflow

How to extract css values from website page

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related