Replace in string based on function ouput

Question

So, for input:

accessibility,random good bye

I want output:

a11y,r4m g2d bye

So, basically, I have to abbreviate all words of length greater than or equal to 4 in the following format: first_letter + length_of_all_letters_in_between + last_letter

I try to do this:

re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])", r"\1" + str(len(r"\2")) + r"\3", s)

But it does not work. In JS, I would easily do:

str.replace(/([A-Za-z])([A-Za-z]{2,})([A-Za-z])/g, function(m, $1, $2, $3){
   return $1 + $2.length + $3;
});

How do I do the same in Python?

EDIT: I cannot afford to lose any punctuation present in original string.

re is a bit of an overkill for this, in my opinion. I'd just use mystring[0]+str(len(mystring)-2)+mystring[-1] and an if statement to see when to apply this — Aleksander Lidtke
– Aleksander Lidtke, Commented May 30, 2015 at 10:23
@AleksanderLidtke I thought about it but then mystring has separate individual words (like accessibility,random good bye) and not itself is a word. — Gaurang Tandon
– Gaurang Tandon, Commented May 30, 2015 at 10:24
@AleksanderLidtke, what about the comma? How are you separating the words? — Padraic Cunningham
– Padraic Cunningham, Commented May 30, 2015 at 10:27
mystring is just one word. If you have comma separated words you can just do mycomaseparatedstring.split(',') to get a list of the contents of mycomaseparatedstring separated by commas. Then proceed as with mystring. Sorry, thought this was clear - it was to me because I know Python, perhaps I should have been clearer. — Aleksander Lidtke
– Aleksander Lidtke, Commented May 30, 2015 at 11:24

Cu3PO42 · Accepted Answer · 2015-05-30 10:40:47Z

8

What you are doing in JavaScript is certainly right, you are passing an anonymous function. What you do in Python is to pass a constant expression ("\12\3", since len(r"\2") is evaluated before the function call), it is not a function that can be evaluated for each match!

While anonymous functions in Python aren't quite as useful as they are in JS, they do the job here:

>>> import re
>>> re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])", lambda m: "{}{}{}".format(m.group(1), len(m.group(2)), m.group(3)), "accessability, random good bye")
'a11y, r4m g2d bye'

What happens here is that the lambda is called for each substitution, taking a match object. I then retrieve the needed information and build a substitution string from that.

edited May 30, 2015 at 10:40

answered May 30, 2015 at 10:35

Cu3PO42

1,4831 gold badge11 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Cu3PO42 Over a year ago

@Kasra how is that? It does exactly what the author wanted and is a close analogy to his code in JS

Cu3PO42 Over a year ago

@Kasra indeed it does. This is completely punctuation agnostic.

Blckknght · Accepted Answer · 2015-05-30 10:40:50Z

3

The issue you're running into is that len(r'\2') is always 2, not the length of the second capturing group in your regular expression. You can use a lambda expression to create a function that works just like the code you would use in JavaScript:

re.sub(r"([A-Za-z])([A-Za-z]{2,})([A-Za-z])",
       lambda m: m.group(1) + str(len(m.group(2)) + m.group(3),
       s)

The m argument to the lambda is a match object, and the calls to its group method are equivalent to the backreferences you were using before.

It might be easier to just use a simple word matching pattern with no capturing groups (group() can still be called with no argument to get the whole matched text):

re.sub(r'\w{4,}', lambda m: m.group()[0] + str(len(m.group())-2) + m.group()[-1], s)

answered May 30, 2015 at 10:40

Blckknght

106k11 gold badges135 silver badges188 bronze badges

2 Comments

Cu3PO42 Over a year ago

Very short nag that the author used [A-Za-z] in his original solution and that you may want to change your alternative solution to that instead of \w.

Gaurang Tandon Over a year ago

Accepted for giving solution as well as highlighting my issue.

Padraic Cunningham · Accepted Answer · 2015-05-30 11:18:00Z

tmp, out = "",""
for ch in s:
    if ch.isspace() or ch in {",", "."}:
        out += "{}{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1], ch) if len(tmp) > 3 else tmp + ch
        tmp = ""
    else:
        tmp += ch
out += "{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1]) if len(tmp) > 3 else tmp
print(out)

a11y,r4m g2d bye

If you only want alpha characters use str.isalpha:

tmp, out = "", ""
for ch in s:
    if not ch.isalpha():
        out += "{}{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1], ch) if len(tmp) > 3 else tmp + ch
        tmp = ""
    else:
        tmp += ch
out += "{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1]) if len(tmp) > 3 else tmp
print(out)
a11y,r4m g2d bye

The logic is the same for both, it is just what we check for that differs, if not ch.isalpha() is False we found a non alpha character so we need to process the tmp string and add it to out output string. if len(tmp) is not greater than 3 as per the requirement we just add the tmp string plus the current char to our out string.

We need a final out += "{}{}{} outside the loop to catch when a string does not end in a comma, space etc.. If the string did end in a non-alpha we would be adding an empty string so it would make no difference to the output.

It will preserve punctuation and spaces:

 s = "accessibility,random   good bye !!    foobar?"
def func(s):
    tmp, out = "", ""
    for ch in s:
        if not ch.isalpha():
            out += "{}{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1], ch) if len(tmp) > 3 else tmp + ch
            tmp = ""
        else:
            tmp += ch
    return "{}{}{}".format(tmp[0], len(tmp) - 2, tmp[-1]) if len(tmp) > 3 else tmp
print(func(s,3))
a11y,r4m   g2d bye !!    f4r?

Avinash Raj · Accepted Answer · 2015-05-30 11:51:27Z

1

Keep it simple...

>>> s = "accessibility,random good bye"
>>> re.sub(r'\B[A-Za-z]{2,}\B', lambda x: str(len(x.group())), s)
'a11y,r4m g2d bye'

\B which matches between two word characters or two non-word chars helps to match all the chars except first and last.

edited May 30, 2015 at 11:51

answered May 30, 2015 at 11:43

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

1 Comment

Gaurang Tandon Over a year ago

Excellent! Never thought of that!

Kasravnd · Accepted Answer · 2015-05-30 21:32:32Z

As an alternative precise way you can use a separate function for re.sub and use the simple regex r"(\b[a-zA-Z]+\b)".

>>> def replacer(x): 
...    g=x.group(0)
...    if len(g)>3:
...        return '{}{}{}'.format(g[0],len(g)-2,g[-1])
...    else :
...        return g
... 
>>> re.sub(r"(\b[a-zA-Z]+\b)", replacer, s)
'a11y,r4m g2d bye'

Also as a pythonic and general way, to get the replaced words within a list you can use a list comprehension using re.finditer :

>>> from operator import sub
>>> rep=['{}{}{}'.format(i.group(0)[0],abs(sub(*i.span()))-2,i.group(0)[-1]) if len(i.group(0))>3 else i.group(0) for i in re.finditer(r'(\w+)',s)]
>>> rep
['a11y', 'r4m', 'g2d', 'bye']

The re.finditer will returns a generator contains all matchobjects then you can iterate over it and get the start and end of matchobjects with span() method.

perreal · Accepted Answer · 2015-05-30 11:25:22Z

0

Using regex and comprehension:

import re
s = "accessibility,random good bye"
print "".join(w[0]+str(len(w)-2)+w[-1] if len(w) > 3 else w for w in re.split("(\W)", s))

Gives:

a11y,r4m g2d bye

edited May 30, 2015 at 11:25

answered May 30, 2015 at 10:35

perreal

98.7k23 gold badges159 silver badges187 bronze badges

1 Comment

Blckknght Over a year ago

This will abbreviate any four-or-more character long run of non-word characters. Try s='foo... bar' to see for yourself!

pythondetective · Accepted Answer · 2015-05-30 10:38:55Z

-1

Have a look at the following code

sentence = "accessibility,random good bye"
sentence = sentence.replace(',', " ")
sentence_list = sentence.split(" ")
for item in sentence_list:
    if len(item) >= 4:
        print item[0]+str(len(item[1:len(item)-1]))+item[len(item)-1]

The only thing you should take care of comma and other punctuation characters.

answered May 30, 2015 at 10:38

pythondetective

3361 silver badge9 bronze badges

Collectives™ on Stack Overflow

Replace in string based on function ouput

7 Answers 7

2 Comments

2 Comments

Comments

1 Comment

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

2 Comments

2 Comments

Comments

1 Comment

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related