2

Coming from the land of Perl, I can do something like the following to test the membership of a string in a particular unicode block:

# test if string has any katakana script characters
my $japanese = "カタカナ";
if ($japanese =~ /\p{InKatakana}/) {
   print "string has katakana"
}

I've read that Python does not support unicode blocks (true?) - so what's the best way to impliment this manually? For example, the above unicode block range for {InKatakana} should be U+30A0…U+30FF. How can I test the unicode range in Python? Any other recommended solutions?

I would prefer not to go with an external wrapper like Ponyguruma to limit the number of dependencies for roll-out/maintenance.

2 Answers 2

8
>>> re.search(u'[\u30a0-\u30ff]', u'カタカナ')
<_sre.SRE_Match object at 0x7fa0dbb62578>
Sign up to request clarification or add additional context in comments.

Comments

2

As Ignacio said, the re expression is very useful. Don't forget the import first. This search only finds full-width katakana.

import re  
re.search(u'[\u30a0-\u30ff]', u'カタカナ')  

Or you might already have a string on hand.

import re  
x = "カタカナ"  
re.search(u'[\u30a0-\u30ff]', x.decode('utf-8'))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.