Java int array to StringBuilder

Question

How to convert int array with UTF-8 string to StringBuilder in a while loop? For example:
int array: 71, 73, 70, 56, 57, 97, 149, 0, 55, 0, 247...
resulting string: GIF89a• €÷€ € €€ÀÜÀ¦Êð*?ª*?ÿ...
The line contains Latin, Cyrillic and Asian characters, and various symbols and numbers

do buffer.append((char)num[++i]);
while((byte)buffer.charAt(buffer.length()-1) != -1);

This method breaks down all non-Latin characters.

Could you show the data for the entire buffer?

Sergey Kalinichenko
– Sergey Kalinichenko

2012-06-07 20:30:11 +00:00
Commented Jun 7, 2012 at 20:30 — Sergey Kalinichenko
– Sergey Kalinichenko, Commented Jun 7, 2012 at 20:30
+1 for getting weird symbols in the question.. :)

Asif
– Asif

2012-06-07 20:34:58 +00:00
Commented Jun 7, 2012 at 20:34 — Asif
– Asif, Commented Jun 7, 2012 at 20:34

Malcolm Smith · Accepted Answer · 2012-06-07 20:35:55Z

3

First of all convert the int[] to a byte[] as follows:

    //intArray contains your data...
    byte[] utf8bytes = new byte[intArray.length];
    for(int i = 0; i < intArray.length; i++)
    {
        utf8bytes[i] = (byte) intArray[i];
    }

Then create a string from your bytes specifying UTF-8 as the encoding:

    String asString = new String(utf8bytes, "UTF-8");

answered Jun 7, 2012 at 20:35

Malcolm Smith

3,59029 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Dmitriy Over a year ago

Is int contains 1 byte instead of 4?

Malcolm Smith Over a year ago

From your (admittedly small), selection of example values it looked like you were dealing with an array of ints < 256, and therefore easily castable into bytes. If you did have 4 bytes packed into your ints they would mostly have very large absolute values. You could unpack them into separate bytes using bit masks and logical shifts if that was the case....

Dmitriy Over a year ago

utf8bytes[0] = (byte)(intArray[i] >>> 24); utf8bytes[1] = (byte)(intArray[i] >>> 16); utf8bytes[2] = (byte)(intArray[i] >>> 8); utf8bytes[3] = (byte)intArray[i]; After each Latin character adds 3 space characters. After each Cyrillic character adds 2 space characters.

Edwin Buck · Accepted Answer · 2012-06-07 20:37:25Z

0

You are reading in a GIF89a file as one integer per byte, and then printing it out as if it were a text string. The main problem is that the integers (bytes) within that file do not actually map to meaningful text characters, so where the mapping fails to render portions of the alphabet, it will render whatever your text encoding dictates (which looks to me like a lot of garbage).

Graphical information does not always map cleanly to text. While there are 256 possible byte values, and sometimes one or more bytes will represent a single character, there are only 26 letters in the English alphabet, which are represented in upper and lower case. Along with the ten digits and a handful of punctuation, you get about 80 different characters which are in common use in an essay. The rest of the 160+ characters are control codes, signals to use multi-bytes, or mappings to characters present to support display of foreign languages.

That garbage is the closest thing to the valid bytes to characters mapping for your current character set. If you want a better output, then try reading a file that contains data which maps to something character related.

edited Jun 7, 2012 at 20:37

answered Jun 7, 2012 at 20:31

Edwin Buck

71.2k7 gold badges103 silver badges145 bronze badges

1 Comment

Dmitriy Over a year ago

No, this is just an example, the program is not designed for reading files. The program will work with text messages in Russian and Asian languages

Collectives™ on Stack Overflow

Java int array to StringBuilder

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related