1

I am searching some simple Cyrillic patterns in strings using python. The pattern I am using is like /[а-я]+/[а-я]+. When I search pattern by this code

import re
re.search('/[а-я]+/[а-я]+', '/бцршб/бйцбйц')

It cannot find anything. But when I write it like this.

import re
re.search(u'/[а-я]+/[а-я]+', u'/бцршб/бйцбйц')

It works. However in my case, the pattern and text are predefined in Database, so I couldn't find a way to convert them to the Unicode string. What is the solution in this case. Any help would be appreciated.

6
  • What do you mean by "predefined in storage"? Please post a complete, short program that demonstrates the problem you are having. Commented Jul 28, 2015 at 1:06
  • @jwodder You can try using decode() on strings, it would give you AttributeError: 'str' object has no attribute 'decode' . Commented Jul 28, 2015 at 1:08
  • 1
    @Anand: actually the behavior you describe is Python 3's where "str"s are already unicode objects. Commented Jul 28, 2015 at 1:12
  • Thank you guys. It works when decoding strings. So the code is like: import re pattern = '/[а-я]+/[а-я]+'.decode('utf-8') text = '/йцбйц/бйцбц'.decode('utf-8') re.search(pattern, text) Commented Jul 28, 2015 at 1:14
  • Oh ok , correct, I just tried in Python 2.x , decode() is what the OP needs. Commented Jul 28, 2015 at 1:15

1 Answer 1

1

Thank you guys. It works when decoding strings. So the code is like:

import re 
pattern = '/[а-я]+/[а-я]+'.decode('utf-8') 
text = '/йцбйц/бйцбц'.decode('utf-8') 
re.search(pattern, text)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.