1

I am trying to find a Unicode pattern but it always returns an empty list [ ]. I have tried the same pattern in Kwrite and it worked fine.

I have tried \u \\u in place of \w but didn't work for me. Here Unicode string can be any Unicode string.

InputString=r"[[ਅਤੇ\CC_CCD]]_CCP"

Result = re.findall(r'[\[]+[\w]+\\\w+[\]]+[_]\w+',InputString,flags=re.U)

print(Result)

1 Answer 1

1

There seems to be an extra character between and \ which cannot be matched by \w+. It's hex value is 0xA47 So, I have added [\u0A47] in the regex.

Try this Regex:

\[+\w+[\u0A47]\\\w+]]\w+

Click for Demo

Explanation:

  • \[+ - matches 1+ occurrences of [
  • \w+ - matches 1+ occurrences of a word character
  • [^\\]* - matches 0+ occurrences of any character which is not \
  • \\ - matches \
  • \w+ - matches 1+ occurrences of a word character
  • ]] - matches ]]
  • \w+ - matches 1+ occurrences of a word character

Python code

The words are from Gurmukhi language. The unicode range is 0A00 - 0A7F. So you can also use the regex:

\[+[\u0A00-\u0A7F]+\\\w+]]\w+

Click for Demo

Sign up to request clarification or add additional context in comments.

4 Comments

this works, can explain it why need a '.' after \w. and if we have multiple patterns in a string it returns only first. Link (regex101.com/r/vv2Qzl/4)
@UMR See the full updated answer for the explanation.
@UMR See the 2nd regex I have posted in the answer. It will match all the gurmukhi characters.
Thanks, the Second one worked fine for me. It matched all the patterns from given text.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.