Java String encoding

Question

What´s the difference between

"hello world".getBytes("UTF-8");

and

 Charset.forName("UTF-8").encode("hello world").array();

? The second code produces a byte array with 0-bytes at the end in most cases.

Jon Skeet · Accepted Answer · 2014-09-04 16:41:12Z

4

Your second snippet uses ByteBuffer.array(), which just returns the array backing the ByteBuffer. That may well be longer than the content written to the ByteBuffer.

Basically, I would use the first approach if you want a byte[] from a String :) You could use other ways of dealing with the ByteBuffer to convert it to a byte[], but given that String.getBytes(Charset) is available and convenient, I'd just use that...

Sample code to retrieve the bytes from a ByteBuffer:

ByteBuffer buffer = Charset.forName("UTF-8").encode("hello world");
byte[] array = new byte[buffer.limit()];
buffer.get(array);
System.out.println(array.length); // 11
System.out.println(array[0]);     // 104 (encoded 'h')

edited Sep 4, 2014 at 16:41

answered Sep 4, 2014 at 16:19

Jon Skeet

1.5m893 gold badges9.3k silver badges9.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sandeep Chatterjee Over a year ago

Just observed that byte[] b1 = "hello world".getBytes("UTF-8");byte[] b2 = Charset.forName("UTF-8").encode("hello world").array();. b1.length prints out 11 and b2.length prints out 12.

Jon Skeet Over a year ago

@Sandeep: Yes, because the ByteBuffer has presumably been allocated with a backing array of length 12. If you call limit() on the ByteBuffer instead, you'll just get 11 bytes...

Collectives™ on Stack Overflow

Java String encoding

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related