0

This code is supposed to convert a character strings to binary ones, but with a few strings, it returns a String with 16 binary digits, not 8 as I expected them to be.

public class aaa {        
    public static void main(String argv[]){
        String nux="ª";
        String nux2="Ø";
        String nux3="(";
        byte []bites = nux.getBytes();
        byte []bites2 = nux2.getBytes();
        byte []bites3 = nux3.getBytes();
               System.out.println(AsciiToBinary(nux));
               System.out.println(AsciiToBinary(nux2));
               System.out.println(AsciiToBinary(nux3));
               System.out.println("number of bytes :"+bites.length);
               System.out.println("number of bytes :"+bites2.length);
               System.out.println("number of bytes :"+bites3.length);


    }

    public static String AsciiToBinary(String asciiString){  

          byte[] bytes = asciiString.getBytes();  
          StringBuilder binary = new StringBuilder();  
          for (byte b : bytes)  
          {  
             int val = b;  
             for (int i = 0; i < 8; i++)  
             {  
                binary.append((val & 128) == 0 ? 0 : 1);  
                val <<= 1;  
             }  
             binary.append(' ');
          }  
          return binary.toString();  
    } 

}

in the first two strings, I don't understand why they return 2 bytes, since they are single-character strings.

Compiled here to: https://ideone.com/AbxBZ9

This returns:

11000010 10101010 
11000011 10011000 
00101000 
number of bytes :2
number of bytes :2
number of bytes :1

I am using this code: Convert A String (like testing123) To Binary In Java

NetBeans IDE 8.1

4
  • 1
    What makes you think that the number of characters is the same as the number of bytes? There's tens of thousands of symbols out there. They can't all be represented with a single byte. It strongly depends on the encoding you use, but multi-byte encodings are rather common. Commented Jan 30, 2016 at 22:45
  • Note that getBytes can take an argument for the character set you want to use. Commented Jan 30, 2016 at 22:47
  • The ASCII code only has 256 symbols (one per possible byte value). The lower 128 symbols are the same as UTF-8, ISO-8859-1, and other popular encodings; so as long as you do not use non-english symbols, you may think that everything is just ASCII. Commented Jan 30, 2016 at 22:50
  • There are more possible characters than possible byte values. So clearly not all characters can be encoded in a single byte. Commented Mar 2, 2016 at 0:18

2 Answers 2

6

A character is not always 1-byte long. Think about it - many languages, such as Chinese or Japanese, have thousands of characters, how would you map those characters to bytes?

You are using UTF-8 (one of the many, many ways of mapping characters to bytes) - looking up a character table for UTF-8, and searching for the sequence 11000010 10101010, I arrive at

U+00AA  ª   11000010 10101010

Which is the UTF-8 encoding for ª. UTF-8 is often the default character encoding (charset) for Java -- but you cannot rely on this. That is why you should always specify a charset when converting strings to bytes or vice-versa

Sign up to request clarification or add additional context in comments.

Comments

-1

you can understand why some character are two bytes by running this simple code

    // integer - binary 
    System.out.println(Byte.MIN_VALUE);             
    // -128 - 0b11111111111111111111111110000000

    System.out.println(Byte.MAX_VALUE);             
    // 127 - 0b1111111

    System.out.println((int) Character.MIN_VALUE);  
    // 0   - 0b0

    System.out.println((int) Character.MAX_VALUE);  
    // 65535 - 0b1111111111111111

as you can see ,we can show Byte.MAX_VALUE with just 7 bits or 1 byte (01111111)

if you cast Character.MIN_VALUE to integer, it will be : 0
we can show it's binary format with one bit or 1 byte (00000000)!

but what about Character.MAX_VALUE ?

in binary format it's 1111111111111111 which is 65535 in decimal format
and can be shown with 2 bytes (11111111 11111111).

so characters which their decimal format is between 0 and 65535 can be shown with 1 or 2 bytes.

hope you understand.

3 Comments

Your code only proves that Character.MAX_VALUE requires at least 2 bytes, but does not explain why some chars fit in a byte and others don't. The binary value of Byte.MIN_VALUE is also not that useful (1073741792 is -128 only when interpreted as 4-byte two's complement. In that sense, 0b10000000 is clearer & shorter: -128 in 1-byte two's complement).
So why you don't edit my post,I said what I know, instead of down voting my or other posts,try to edit them,we are here to share our knowledge,I did now is your turn :)
If you edit your post so that it answers the question, I will be happy to change my vote. I have not looked at your other posts. You are responsible for editing your own posts if they are wrong or don't answer the question: share knowledge, but take responsibility to make sure it is accurate.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.