1

I wanna to match string that contain \xa0, Like:

"\xa0" =~ /\xa0/

But error will throw with:

SyntaxError: (eval):2: invalid multibyte escape: /\xa0/

I am try to use Unicode to match:

"\xa0" =~ /\u00a0/

error will throw too:

ArgumentError: invalid byte sequence in UTF-8

So, how to match \xa0 in ruby

0

1 Answer 1

3

Not every byte sequence is a valid Unicode string. (or more specifically UTF-8)

Your single-byte string for example is not:

str = "\xa0"

str.encoding        #=> #<Encoding:UTF-8>
str.valid_encoding? #=> false
str.codepoints      #   ArgumentError (invalid byte sequence in UTF-8)

To work with an arbitrary string, you have set its encoding to binary / ASCII:

str = "\xa0".b      # <-- note the .b

str.encoding        #=> #<Encoding:ASCII-8BIT>
str.valid_encoding? #=> true
str.codepoints      #=> [160]

and also set the regexp encoding to ASCII: (via the n modifier)

str =~ /\xa0/n
#=> 0
Sign up to request clarification or add additional context in comments.

1 Comment

@TangMonk because the byte 0x01 is a valid codepoint in Unicode.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.