Replace substring in a string

Question

My function finds in string hex notation (hexadecimal CSS colors) and replaces with the short notation.
For example: #000000 can be represented as #000

import re

def to_short_hex (string):
    match = re.findall(r'#[\w\d]{6}\b', string)

    for i in match:
        if not re.findall(r'#' + i[1] + '{6}', i):
            match.pop(match.index(i))

    for i in match:
        string = string.replace(i, i[:-3])

    return string;

to_short_hex('text #FFFFFF text #000000 #08088')

Out:

text #FFF text #000 #08088

Is there any way to optimize my code using list comprehension etc..?

There is a recipe at ActiveState using a slightly longer regex. code.activestate.com/recipes/… — John P
– John P, Commented Feb 11, 2012 at 14:25

Ricardo Cárdenes · Accepted Answer · 2012-02-11 14:57:29Z

3

How about this? You can speed it up embedding is6hexdigit into to_short_hex, but I wanted it to be more readable.

hexdigits = "0123456789abcdef"

def is6hexdigit(sub):
    l = sub.lower()
    return (l[0] in hexdigits) and (l.count(l[0]) == 6)

def to_short_hex(may_have_hexes):
    replaced = ((sub[3:] if is6hexdigit(sub[:6]) else sub)
                        for sub in may_have_hexes.split('#'))
    return '#'.join(replaced)

edited Feb 11, 2012 at 14:57

answered Feb 11, 2012 at 14:39

Ricardo Cárdenes

9,1941 gold badge23 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

senderle Over a year ago

@GoingTham, it is a module name, and probably should be replaced for that reason.

WayHunter Over a year ago

@Ricardo Cárdenes, thx. I can be mistaken, but it less readable for me :)

GoingTharn Over a year ago

Right, early in the morning here.

Ricardo Cárdenes Over a year ago

I used string just because OP did, and the string module is not used in the functions anyway, but yeah, I agree that it's not the ideal.

Ricardo Cárdenes Over a year ago

@AlexanderGuiness: whenever you start using comprehensions things tend to go down the readable way ;), but try substituting the whole is6hexdigit into the (... if ... else ...) and you'll get my point :D

|

Weeble · Accepted Answer · 2012-02-11 15:28:41Z

2

This is what re.sub is for! It's not a great idea to use a regex to find something and then do a further sequence of search-and-replace operations to change it. For one thing, it's easy to accidentally replace things you didn't mean to, and for another it does a lot of redundant work.

Also, you might want to shorten '#aaccee' to '#ace'. This example does that too:

def to_short_hex(s):
    def shorten_match(match):
        hex_string = match.group(0)
        if hex_string[1::2]==hex_string[2::2]:
            return '#'+hex_string[1::2]
        return hex_string
    return re.sub(r"#[\da-fA-F]{6}\b", shorten_match, s)

Explanation

re.sub can take a function to apply to each match. It receives the match object and returns the string to substitute at that point.

Slice notation allows you to apply a stride. hex_string[1::2] takes every second character from the string, starting at index 1 and running to the end of the string. hex_string[2::2] takes every second character from the string, starting at index 2 and running to the end. So for the string "#aaccee", we get "ace" and "ace", which match. For the string "#123456", we get "135" and "246", which don't match.

answered Feb 11, 2012 at 15:28

Weeble

18.1k4 gold badges68 silver badges87 bronze badges

1 Comment

WayHunter Over a year ago

You are right. I forgot about this notation: '#aaccee' to '#ace'

senderle · Accepted Answer · 2012-02-11 14:52:21Z

1

Using pop on a list while iterating over it is always a bad idea. Hence this isn't an optimization, but a correction of a bug. Also, I edited the re to prevent recognition of strings like '#34j342' from being accepted:

>>> def to_short_hex(s):
...     matches = re.findall(r'#[\dabcdefABCDEF]{6}\b', s)
...     filtered = [m for m in matches if re.findall(r'#' + m[1] + '{6}', m)]
...     for m in filtered:
...         s = s.replace(m, m[:-3])
...     return s
... 
>>> to_short_hex('text #FFFFFF text #000000 #08088')
'text #FFF text #000 #08088'

Also, I think re.search is sufficient in the second re.

answered Feb 11, 2012 at 14:52

senderle

152k36 gold badges218 silver badges244 bronze badges

Collectives™ on Stack Overflow

Replace substring in a string

3 Answers 3

6 Comments

Explanation

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

Explanation

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related