4

I'm working on a part of a project, which is repleacing http url's with https url's if possible.

The Problem is, that the regular expressions for that are written for the javascript regex parser, but I'm using that regex inside python. To be compatible, I would rewrite the regex during parsing into a valide python regex.

as example, I have that regular expression given:

https://$1wikimediafoundation.org/

and I would a regular expression like that:

https://\1wikimediafoundation.org/

my problem is that I doesn't know how to do that (converting $ into \)


This code doesn't work:

'https://$1wikimediafoundation.org/'.replace('$', '\')

generate the following error:

SyntaxError: EOL while scanning string literal

This code work without error:

'https://$1wikimediafoundation.org/'.replace('$', '\\')

but generate a wrong output:

'https://\\1wikimediafoundation.org/'
1
  • 2
    Your substitution is correct, you're probably being confused by the way you display the result. Print it out with print and you'll only see one backslash. Commented Sep 14, 2014 at 21:01

4 Answers 4

2

You test your regex here https://regex101.com/, and then change it to python. Additionaly, to replace the matched group, you can use re.sub module on these lines:

re.sub(r"'([^']*)'", r'{\1}', col ) ) replace

'Protein_Expectation_Value_Log(e)', 'Protein_Intensity_Log(I)'

{Protein_Expectation_Value_Log(e)}, {Protein_Intensity_Log(I)}

More you can refer here

Sign up to request clarification or add additional context in comments.

Comments

1

Actually it works:

>>> 'https://$1wikimediafoundation.org/'.replace('$', '\\')
'https://\\1wikimediafoundation.org/'
>>> print 'https://$1wikimediafoundation.org/'.replace('$', '\\')
https://\1wikimediafoundation.org/

when you are doing 'https://$1wikimediafoundation.org/'.replace('$', '\\'), it's returning the __repr__ (~representation) of the string and you can see special characters.

By printing it, you are using the __str__, the readable version. (See this answer on __str__ vs __repr__)

1 Comment

My problem is that I would change the representation of the string, not the readable version, because I would parse this string as regular expression in the next step.
1

try this:

'https://$1wikimediafoundation.org/'.replace('$', r'\')

adding r"\" whill automatically escape the backslash which you are trying to do.

Comments

0

Note that $& in replacement patterns should be converted to \g<0>, since \0 is \0x00 character in python regex

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.