29

How do I extend the String class, and attach a method named to_bytes?

3 Answers 3

51

String#bytes returns enumerator through string bytes.

"asd".bytes
=> [97, 115, 100]

In Ruby 1.9.3 the #bytes was returning an Enumerator so you had to add .to_a to convert it to an Array. Since 2.3 or maybe even earlier you don't have to add it anymore.

Sign up to request clarification or add additional context in comments.

1 Comment

The .to_a is superfluous nowadays ("asd".bytes.class === Array)
27

Ruby already has a String#each_byte method which is aliased to String#bytes.

Prior to Ruby 1.9 strings were equivalent to byte arrays, i.e. a character was assumed to be a single byte. That's fine for ASCII text and various text encodings like Win-1252 and ISO-8859-1 but fails badly with Unicode, which we see more and more often on the web. Ruby 1.9+ is Unicode aware, and strings are no longer considered to be made up of bytes, but instead consist of characters, which can be multiple bytes long.

So, if you are trying to manipulate text as single bytes, you'll need to ensure your input is ASCII, or at least a single-byte-based character set. If you might have multi-byte characters you should use String#each_char or String.split(//) or String.unpack with the U flag.


What does // mean in String.split(//)

// is the same as using ''. Either tells split to return characters. You can also usually use chars.

4 Comments

"фыв".bytes.to_a => [209, 132, 209, 139, 208, 178] — bytes of unicode string. And asker wanted bytes, not characters. I see no problems. Or I don't see something?
Yes, and I said there are already each_byte and bytes methods available so there is no need to extend String with a to_bytes method. Regarding Unicode characters and bytes: Yes, you can convert the character into its component bytes easily, but you can not manipulate them like you would individual characters because some of the bytes are not character values but indicate how the character is modified. Anyone who is not aware of that and expects to treat text as bytes these days will have a great awakening when they encounter Unicode for the first time.
what does // meain in String.split(//)?
// is the same as using ''. Either tells split to return characters. You can also usually use chars.
5

With the help of unpack we can convert the string into any format:- byte, bite(MSB,LSB), ASCII or hex. please go through this link:- http://blog.bigbinary.com/2011/07/20/ruby-pack-unpack.html. To convert string into bytes:-

"abcde".unpack('c*')  
=> [97, 98, 99, 100, 101]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.