3

There are two named groups in my pattern: myFlag and id, I want to add one more myFlag immediately before group id.

Here is my current code:

# i'm using Python 3.4.2
import re
import os
contents = b'''
xdlg::xdlg(x_app* pApp, CWnd* pParent)
    : customized_dlg((UINT)0, pParent, pApp)
    , m_pReaderApp(pApp)
    , m_info(pApp)
{

}
'''

pattern = rb'(?P<myFlag>[a-zA-Z0-9_]+)::(?P=myFlag).+:.+(?P<id>\(UINT\)0 *,)'
res = re.search(pattern, contents, re.DOTALL)
if None != res:
    print(res.groups()) # the output is (b'xdlg', b'(UINT)0,')

# 'replPattern' becomes b'(?P<myFlag>[a-zA-Z0-9_]+)::(?P=myFlag).+:.+((?P=myFlag)\\(UINT\\)0 *,)'
replPattern = pattern.replace(b'?P<id>', b'(?P=myFlag)', re.DOTALL)
print(replPattern)
contents = re.sub(pattern, replPattern, contents)
print(contents)

The expected results should be:

xdlg::xdlg(x_app* pApp, CWnd* pParent)
    : customized_dlg(xdlg(UINT)0, pParent, pApp)
    , m_pReaderApp(pApp)
    , m_info(pApp)
{

}

but now the result this the same with the original:

 xdlg::xdlg(x_app* pApp, CWnd* pParent)
    : customized_dlg((UINT)0, pParent, pApp)
    , m_pReaderApp(pApp)
    , m_info(pApp)
{

}
7
  • Why are you replacing ?P<id> (no parens) with (?P=myFlag) (with parens)? Commented Jul 12, 2015 at 1:58
  • Besides, there is no third xdlg string in the input. What exactly are you trying to achieve here? Commented Jul 12, 2015 at 1:59
  • because I want to replace the string with 'myFlag' group, it's a must to encloe them with parens(python syntax) Commented Jul 12, 2015 at 2:00
  • What is the expected output here? What problem are you trying to solve? Commented Jul 12, 2015 at 2:00
  • You replaced the name of a named group with a back reference, the regex syntax makes no sense here. Commented Jul 12, 2015 at 2:01

1 Answer 1

2

The issue appears to be the pattern syntax — particularly the end:

0 *,)

That makes no sense really... fixing it seems to solve most of the issues, although I would recommend ditching DOTALL and going with MULTILINE instead:

p = re.compile(ur'([a-zA-Z0-9_]+)::\1(.*\n\W+:.*)(\(UINT\)0,.*)', re.MULTILINE)
sub = u"\\1::\\1\\2\\1\\3"
result = re.sub(p, sub, s)

print(result)

Result:

xdlg::xdlg(x_app* pApp, CWnd* pParent)
    : customized_dlg(xdlg(UINT)0, pParent, pApp)
    , m_pReaderApp(pApp)
    , m_info(pApp)
{

}

https://regex101.com/r/hG3lV7/1

Sign up to request clarification or add additional context in comments.

6 Comments

i had tried your pattern, it did work. however, i'm confused, is it necessary to introduce 5 groups? is it possbile to solve this issue with 2 groups?
No five groups aren't necessary (see edit), although I couldn't entirely make sense of the ones you had either. In my opinion named groups in Python are somewhat worthless unless you have many groups that you need to keep track of across multiple replacements, etc. Otherwise it's just more code that essentially makes things more confusing.
It's not necessary, so remove it. I think the pattern can be further improved too — the answer in this instance was to mainly to illustrate what the issue was for it not working.
how can I achieve the same effect with re.DOTALL?
re.DOTALL should work with the pattern above (still using three capture groups), although could be simplified to be something like ([a-zA-Z0-9_]+)::\1(.*)(\(UINT\)0,.*).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.