I have a situation where I need to know the size of a String/encoding pair, in bytes, but cannot use the getBytes() method because 1) the String is very large and duplicating the String in a byte[] array would use a large amount of memory, but more to the point 2) getBytes() allocates a byte[] array based on the length of the String * the maximum possible bytes per character. So if I have a String with 1.5B characters and UTF-16 encoding, getBytes() will try to allocate a 3GB array and fail, since arrays are limited to 2^32 - X bytes (X is Java version specific).
So - is there some way to calculate the byte size of a String/encoding pair directly from the String object?
UPDATE:
Here's a working implementation of jtahlborn's answer:
private class CountingOutputStream extends OutputStream {
int total;
@Override
public void write(int i) {
throw new RuntimeException("don't use");
}
@Override
public void write(byte[] b) {
total += b.length;
}
@Override public void write(byte[] b, int offset, int len) {
total += len;
}
}
getBytedoes not create an array bigger then it needs to be. It creates an array of the correct size for the given string. It does not creates an array of length "length of the String * the maximum possible bytes per character". Andstring.length()does not return the number of characters in a string, it returns the number of code units. For UTF-16, a code unit is 16 bits, and the number of code units per character is either 1 or 2, it depends on the character. Therefore, either I don`t understand your second point in your question, or your assumption is not correct.