How to find Unicode Pattern using Regex in Python3.7?

Question

I am trying to find a Unicode pattern but it always returns an empty list [ ]. I have tried the same pattern in Kwrite and it worked fine.

I have tried \u \\u in place of \w but didn't work for me. Here Unicode string can be any Unicode string.

InputString=r"[[ਅਤੇ\CC_CCD]]_CCP"

Result = re.findall(r'[\[]+[\w]+\\\w+[\]]+[_]\w+',InputString,flags=re.U)

print(Result)

Gurmanjot Singh · Accepted Answer · 2019-01-12 07:02:19Z

1

There seems to be an extra character ੇ between ਤ and \ which cannot be matched by \w+. It's hex value is 0xA47 So, I have added [\u0A47] in the regex.

Try this Regex:

\[+\w+[\u0A47]\\\w+]]\w+

Click for Demo

Explanation:

\[+ - matches 1+ occurrences of [
\w+ - matches 1+ occurrences of a word character
[^\\]* - matches 0+ occurrences of any character which is not \
\\ - matches \
\w+ - matches 1+ occurrences of a word character
]] - matches ]]
\w+ - matches 1+ occurrences of a word character

Python code

The words are from Gurmukhi language. The unicode range is 0A00 - 0A7F. So you can also use the regex:

\[+[\u0A00-\u0A7F]+\\\w+]]\w+

Click for Demo

edited Jan 12, 2019 at 7:02

answered Jan 12, 2019 at 6:24

Gurmanjot Singh

10.4k2 gold badges22 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

UMR Over a year ago

this works, can explain it why need a '.' after \w. and if we have multiple patterns in a string it returns only first. Link (regex101.com/r/vv2Qzl/4)

Gurmanjot Singh Over a year ago

@UMR See the full updated answer for the explanation.

Gurmanjot Singh Over a year ago

@UMR See the 2nd regex I have posted in the answer. It will match all the gurmukhi characters.

UMR Over a year ago

Thanks, the Second one worked fine for me. It matched all the patterns from given text.

Collectives™ on Stack Overflow

How to find Unicode Pattern using Regex in Python3.7?

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related