I would like to put a string into a byte array, but the string may be too big to fit. In the case where it's too large, I would like to put as much of the string as possible into the array. Is there an efficient way to find out how many characters will fit?
4 Answers
In order to truncate a string to a UTF8 byte array without splitting in the middle of a character I use this:
static string Truncate(string s, int maxLength) {
if (Encoding.UTF8.GetByteCount(s) <= maxLength)
return s;
var cs = s.ToCharArray();
int length = 0;
int i = 0;
while (i < cs.Length){
int charSize = 1;
if (i < (cs.Length - 1) && char.IsSurrogate(cs[i]))
charSize = 2;
int byteSize = Encoding.UTF8.GetByteCount(cs, i, charSize);
if ((byteSize + length) <= maxLength){
i = i + charSize;
length += byteSize;
}
else
break;
}
return s.Substring(0, i);
}
The returned string can then be safely transferred to a byte array of length maxLength.
Comments
You should be using the Encoding class to do your conversion to byte array correct? All Encoding objects have an overridden method GetMaxCharCount, which will give you "The maximum number of characters produced by decoding the specified number of bytes." You should be able to use this value to trim your string and properly encode it.
1 Comment
Efficient way would be finding how much (pessimistically) bytes you will need per character with
Encoding.GetMaxByteCount(1);
then dividing your string size by the result, then converting that much characters with
public virtual int Encoding.GetBytes (
string s,
int charIndex,
int charCount,
byte[] bytes,
int byteIndex
)
If you want to use less memory use
Encoding.GetByteCount(string);
but that is a much slower method.
Comments
The Encoding class in .NET has a method called GetByteCount which can take in a string or char[]. If you pass in 1 character, it will tell you how many bytes are needed for that 1 character in whichever encoding you are using.
The method GetMaxByteCount is faster, but it does a worst case calculation which could return a higher number than is actually needed.