2

I have a string looks like this:

oldString="this is my {{string-d}}" => "this is my {{(string-d)}}"
oldString2="this is my second{{ new_string-d }}" => "this is my second{{ (new_string-d) }}"
oldString2="this is my second new_string-d " => "this is my second (new_string-d) "
oldString2="this is my second new[123string]-d " => "this is my second (new[123string]-d) "

I want to add brackets whenever I see "-d" right after it and before the word that is attached to it.

I wrote a code that looks for the pattern "-d" in strings and partition the string after finding the pattern to 3 partitions before "-d", after "-d" and "-d" itself then I check the block before "-d" until I find whitespace or "{" and stop and add brackets. my code looks like this: P.S. I have many files that I read from them and try to modify the string there the example above is just for demonstrating what I'm trying to do.

   if ('-d') in oldString:
    p = oldString.partition('-d')
    v = p[p.index('-d')-1]
    beforeString=''
    for i in reversed(v):
        if i != ' ' or i != '{':
            beforeString=i+beforeString 
            indexNew = v.index(i)
    outPutLine = v[:indexNew]+'('+v[indexNew:]
    newString = outPutLine + '-d' + ' )'
    print newString

the result of running the code will be:

newString = "(this is my {{string-d )"

as you can see that the starting bracket is before "this" instead of before "string" why is this happening? also, I'm not sure if this is the best way to do this kind of find and replace any suggestions would be much appreciated.

2 Answers 2

5
>>> import re
>>> oldString = "this is my {{string-d}}"
>>> oldString2 = "this is my second{{ new_string-d }}"
>>> re.sub(r"(\w*-d)", r"(\1)", oldString)
'this is my {{(string-d)}}'
>>> re.sub(r"(\w*-d)", r"(\1)", oldString2)
'this is my second{{ (new_string-d) }}'

Note that this matches "words" assuming that a word is composed of only letters, numbers, and underscores.


Here's a more thorough breakdown of what's happening:

  • An r before a string literal means the string is a "raw string". It prevents Python from interpreting characters as an escape sequence. For instance, r"\n" is a slash followed by the letter n, rather than being interpreted as a single newline character. I like to use raw strings for my regex patterns, even though it's not always necessary.
  • the parentheses surrounding \w*-d is a capturing group. It indicates to the regex engine that the contents of the group should be saved for later use.
  • the sequence \w means "any alphanumeric character or underscore".
  • * means "zero or more of the preceding item". \w* together means "zero or more alphanumeric characters or underscores".
  • -d means "a hyphen followed by the letter d.

All together, (\w*-d) means "zero or more alphanumeric characters or underscores, followed by a hyphen and the letter d. Save all of these characters for later use."

The second string describes what the matched data should be replaced with. "\1" means "the contents of the first captured group". The parentheses are just regular parentheses. All together, (\1) in this context means "take the saved content from the captured group, surround it in parentheses, and put it back into the string".


If you want to match more characters than just alphanumeric and underscore, you can replace \w with whatever collection of characters you want to match.

>>> re.sub(r"([\w\.\[\]]*-d)", r"(\1)", "{{startingHere[zero1].my_string-d }}")
'{{(startingHere[zero1].my_string-d) }}'

If you also want to match words ending with "-d()", you can match a parentheses pair with \(\) and mark it as optional using ?.

>>> re.sub(r"([\w\.\[\]]*-d(\(\))?)", r"(\1)", "{{startingHere[zero1].my_string-d() }}")
'{{(startingHere[zero1].my_string-d()) }}'
Sign up to request clarification or add additional context in comments.

2 Comments

thanks Kevin for your post! could you please explain how the regular expression works? I'm new to regular expressions :)
Yeah, alright, I'll compose an explanation in a minute.
0

If you want the bracketing to only take place inside double curly braces, you need something like this:

re.sub(r'({{\s*)([^}]*-d)(\s*}})', r'\1(\2)\3', s)

Breaking that down a bit:

# the target pattern
r'({{\s*)([^}]*-d)(\s*}})'
# ^^^^^^^ capture group 1, opening {{ plus optional space
#        ^^^^^^^^^ capture group 2, non-braces plus -d
#                 ^^^^^^^ capture 3, spaces plus closing }}

The replacement r'\1(\2)\3' just assembles the groups, with parenthesis around the middle one.

Putting it together:

import re

def quote_string_d(s):
    return re.sub(r'({{\s*)([^}]*-d)(\s*}})', r'\1(\2)\3', s)

print(quote_string_d("this is my {{string-d}}"))
print(quote_string_d("this is my second{{ new_string-d }}"))
print(quote_string_d("this should not be quoted other_string-d "))

Output:

this is my {{(string-d)}}
this is my second{{ (new_string-d) }}
this should not be quoted other_string-d 

Note the third instance does not get the parentheses, because it's not inside {{ }}.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.