0

I've been trying do some regexp on css files to extract certain attributes but I'm having trouble matching from { to the first ; and so on .. here is what I've achieved so far:

Example css:

.pancake{height:200px;taste:delicious;}

I managed to write these 2 little pricks to work :

This one gets everything from the dot to the { :

(^\.[a-z]+)

This one gets everything between { } :

{.+}

I tried reading some regexp but can't understand how to match more than once occurrence or match until a certain character within another match ( sub-matching )

4
  • If you're trying to parse the CSS in any way (get certain attributes out of the text), I would suggest not using regex, but rather tokenizing the string. Commented Nov 23, 2014 at 1:33
  • That's another step ! Thank you for the reminder :) But what I'm really trying to do now is selecting the correct data. Commented Nov 23, 2014 at 1:39
  • Do you only want to match the class pancake and its description ? Commented Nov 23, 2014 at 1:44
  • Yeah ! I wanted something like a key | value array Commented Nov 23, 2014 at 4:41

3 Answers 3

3

As Jon said, parsing a CSS file using regular expressions is probably not a good idea. The CSS syntax presents many corner cases that you probably don't want to be handling by hand. I suggest you take a look at tinycss, a nice CSS parsing library.

You can use it like this:

import tinycss as tcss

stylesheet = ".pancake{height:200px;taste:delicious;}"
parser = tcss.make_parser()
parsed = parser.parse_stylesheet(stylesheet)

for rule in parsed.rules:
    print "".join(t.as_css() for t in rule.selector)
    for declaration in rule.declarations:
        print "\t{0}: {1}".format(declaration.name, declaration.value.as_css())
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for helping me on this didactic research. I acknowledge using a kind of library would definitively speed my final goal , but right now I am trying to learn it. Thanks nonetheless ! Your answer was helpful :)
I discovered this library yesterday and it's excellent. However, it depends on the quality of the CSS. For example, we are very inconsistent in how we define colours, so sometimes we use #eee, sometimes "whitesmoke" and sometimes #eeeeee. So I can use tinycss (actually I am at the moment) but I either have to standardise my CSS colour or I still need a regex to match a colour in a style string. If that makes sense.
1
\b(\w+):(\w+)(?=;)

Try this if you want to try regex.See demo.

http://regex101.com/r/yP3iB0/1

import re
p = re.compile(ur'\b(\w+):(\w+)(?=;)')
test_str = u".pancake{height:200px;taste:delicious;}"

re.findall(p, test_str)

1 Comment

That is precisely what I needed. I just altered 2 things but you gave me the help I needed. \b(\w+\:)(\w+\;)
0
([^:;\s]+)\s?:\s?([^;\s]+)(?=;)

will allow to parse attributes like: background-color, which @vks's answer did not.

import re

p = re.compile(r'([^:;\s]+)\s?:\s?([^;\s]+)(?=;)')
re.findall(p, " background-color:#E6B8B7; text-align:center;")

Output:

[('background-color', '#E6B8B7'), ('text-align', 'center')]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.