4

The String(byte[] bytes) constructor and String.getBytes() method are not implemented by GWT JRE emulation String class.

Does anybody know of an implementation? I do not want to use char[], But it seems like there is no other solution.

4
  • What are you trying to achieve? Where are you getting byte[] from? Commented Jan 3, 2011 at 14:14
  • I had implemented a space efficient serialization protocol for swing clients, I am trying to adopt this protocol for gwt clients.. Commented Jan 3, 2011 at 14:25
  • what is the character encoding of your byte array or do you want the conversion to be flexible? Commented Jan 4, 2011 at 23:20
  • if I can support utf-8, it would be okay. Commented Jan 4, 2011 at 23:23

3 Answers 3

2

If you create large arrays in Chrome, you might run into a Uncaught RangeError: Maximum call stack size exceeded exception. The code from LINEMAN78 can be modified to use a StringBuilder, which avoids this issue.

public static String getString(byte[] bytes, int bytesPerChar)
{
    if (bytes == null) throw new IllegalArgumentException("bytes cannot be null");
    if (bytesPerChar < 1) throw new IllegalArgumentException("bytesPerChar must be greater than 1");

    final int length = bytes.length / bytesPerChar;
    final StringBuilder retValue = new StringBuilder();

    for (int i = 0; i < length; i++)
    {
        char thisChar = 0;

        for (int j = 0; j < bytesPerChar; j++)
        {
            int shift = (bytesPerChar - 1 - j) * 8;
            thisChar |= (0x000000FF << shift) & (((int) bytes[i * bytesPerChar + j]) << shift);
        }

        retValue.append(thisChar);
    }

    return retValue.toString();
}
Sign up to request clarification or add additional context in comments.

Comments

2

The following code should work, just specify the number of bytes per character.

public class GwtPlayground implements EntryPoint
{
    static final Logger logger = Logger.getLogger("");

    @Override
    public void onModuleLoad()
    {
        VerticalPanel loggerArea = new VerticalPanel();
        logger.addHandler(new HasWidgetsLogHandler(loggerArea));
        RootPanel.get().add(loggerArea);

        String original = new String("A" + "\uffea" + "\u00f1" + "\u00fc" + "C");

        logger.info("original = " + original);
        byte[] utfBytes = getBytes(original, 2);

        String roundTrip = getString(utfBytes, 2);
        logger.info("roundTrip = " + roundTrip);
    }

    public static byte[] getBytes(String string, int bytesPerChar)
    {
        char[] chars = string.toCharArray();
        byte[] toReturn = new byte[chars.length * bytesPerChar];
        for (int i = 0; i < chars.length; i++)
        {
            for (int j = 0; j < bytesPerChar; j++)
                toReturn[i * bytesPerChar + j] = (byte) (chars[i] >>> (8 * (bytesPerChar - 1 - j)));
        }
        return toReturn;
    }

    public static String getString(byte[] bytes, int bytesPerChar)
    {
        char[] chars = new char[bytes.length / bytesPerChar];
        for (int i = 0; i < chars.length; i++)
        {
            for (int j = 0; j < bytesPerChar; j++)
            {
                int shift = (bytesPerChar - 1 - j) * 8;
                chars[i] |= (0x000000FF << shift) & (((int) bytes[i * bytesPerChar + j]) << shift);
            }
        }
        return new String(chars);
    }
}

As @Per Wiklander pointed out, this doesn't truely support UTF-8. Here is a true UTF-8 decoder ported from C here

private static class UTF8Decoder
{
    final byte[] the_input;
    int the_index, the_length;

    protected UTF8Decoder( byte[] bytes )
    {
        super();
        this.the_input = bytes;
        this.the_index = 0;
        this.the_length = bytes.length;
    }


    /*
    Get the next byte. It returns UTF8_END if there are no more bytes.
    */
    int get()
    {
        int c;
        c = the_input[the_index] & 0xFF;
        the_index += 1;
        return c;
    }


    /*
        Get the 6-bit payload of the next continuation byte.
        Return UTF8_ERROR if it is not a contination byte.
    */
    int cont()
    {
        int c = get();
        if( (c & 0xC0) == 0x80 )
            return (c & 0x3F);
        else
            throw new IllegalArgumentException( "Failed to pass strict UTF-8" );
    }

    CharSequence getStringUTF8()
    {
        StringBuilder sb = new StringBuilder( the_input.length ); // allocate a maximum size
        while( the_index < the_length )
        {
            int c; /* the first byte of the character */
            int r; /* the result */

            c = get();
            /*
                Zero continuation (0 to 127)
            */
            if( (c & 0x80) == 0 )
            {
                sb.append( (char) c );
            }
            /*
                One continuation (128 to 2047)
            */
            else if( (c & 0xE0) == 0xC0 )
            {
                int c1 = cont();
                if( c1 >= 0 )
                {
                    r = ((c & 0x1F) << 6) | c1;
                    if( r >= 128 )
                        sb.append( (char) r );
                    else
                        throw new IllegalArgumentException();
                }
            }
            /*
            Two continuation (2048 to 55295 and 57344 to 65535)
            */
            else if( (c & 0xF0) == 0xE0 )
            {
                int c1 = cont();
                int c2 = cont();
                if( (c1 | c2) >= 0 )
                {
                    r = ((c & 0x0F) << 12) | (c1 << 6) | c2;
                    if( r >= 2048 && (r < 55296 || r > 57343) )
                        sb.append( (char) r );
                    else
                        throw new IllegalArgumentException();
                }
            }
            /*
            Three continuation (65536 to 1114111)
            */
            else if( (c & 0xF8) == 0xF0 )
            {
                int c1 = cont();
                int c2 = cont();
                int c3 = cont();
                if( (c1 | c2 | c3) >= 0 )
                    sb.append( (char) ((((c & 0x0F) << 18) | (c1 << 12) | (c2 << 6) | c3) + 65536) ); // TODO this might not work as it is being cast to a char
            }
            else
                throw new IllegalArgumentException( "Failed strict UTF8 parsing" );
        }
        return sb;
    }
}

3 Comments

Well, the problem is we dont know how may bytes is needed to encode a character with utf-8.. If I use utf-16 , it is okay, we know that every character is represented with 2 bytes..
UTF-8 by definition is 1 byte per character. Hence the 8 for 8 bits and the 16 for 16 bits. That is why I made the number of bytes variable.
@LINEMAN78 That is only correct for characters that map to ascii. I'm quoting Joel Spolsky here: "In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes." joelonsoftware.com/articles/Unicode.html
1

Good question. I didn't realize it before.

as far as I know there is only 2 main method that convert byte array to String

  1. You mentioned it
  2. The fantastic way with java.io package that you can't use it on client-side

Here is mine implementation. I think it may be helpful to you

public static String convertByteArrayToString(byte[] byteArray) {
    String s = "";

    for (int i = 0; i < byteArray.length; i++) {
        s += (char) (byteArray[i]);
    }

    return s;
}

You can test it :

byte[] byteArray = new byte[] { 87, 79, 87, 46, 46, 46 };

System.out.println(convertByteArrayToString(byteArray));
System.out.println(new String(byteArray));

1 Comment

your method works only for asci characters not for unicode characters.. I will suggest you to read joelonsoftware.com/articles/Unicode.html

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.