Encoding and decoding UTF-8 byte arrays from and to strings

Question

I'm working on a cross-platform encryption system. One of the requirements is to easily encrypt and decrypt strings in out application code.

The encryption class works flawlessly, but I'm having trouble with string encoding on the java side.

Currently, I have the following static methods:

public static String encrypt(String key, String data)
{
    byte[] decoded_key;
    byte[] decoded_data;
    try
    {
        decoded_key = key.getBytes("UTF-8");
        decoded_data = data.getBytes("UTF-8");
    }
    catch (Exception e)
    {
        //Not Supposed to happen.
        throw new RuntimeException();
    }

    if(decoded_key.length != 16) 
        throw new IllegalArgumentException("Key length must be of 16 bytes. Given is " + decoded_key.length + ".");

    try
    {
        return(IOUtils.toString(encrypt(decoded_key, decoded_data), "UTF-8"));
    }
    catch (Exception e)
    {
        //Not Supposed to happen.
        throw new RuntimeException();
    }
}

public static String decrypt(String key, String data)
{
    byte[] decoded_key;
    byte[] decoded_data;
    try
    {
        decoded_key = key.getBytes("UTF-8");
        decoded_data = data.getBytes("UTF-8");
    }
    catch (Exception e)
    {
        //Not Supposed to happen.
        throw new RuntimeException();
    }

    if(decoded_key.length != 16) 
        throw new IllegalArgumentException("Key length must be of 16 bytes. Given is " + decoded_key.length + ".");

    try
    {
        return(IOUtils.toString(decrypt(decoded_key, decoded_data), "UTF-8"));
    }
    catch (Exception e)
    {
        //Not Supposed to happen.
        throw new RuntimeException();
    }
}

My unit tests are failing when decrypting. I ran a test where I compared a byte array of encoded UTF-8 data encoded_data with IOUtils.toString(encoded_data, "UTF-8").getBytes("UTF-8") and for some reason they turned out to be different arrays altogether. No wonder my decryption algorithm is failing.

What is the proper procedure to convert from a java string to a UTF-8 byte array and back to a java string?

why are you representing your "keys" as Strings in the first place? presumably they are arbitrary bytes? you are probably corrupting your keys by converting them into Strings in the first place. — jtahlborn
– jtahlborn, Commented May 16, 2013 at 15:38
you're missing my point. how are you maintaining the keys as strings without corrupting them? — jtahlborn
– jtahlborn, Commented May 16, 2013 at 15:49
@jtahlborn maybe the key is alphanumeric? I would see a problem if he tries to convert the encrypted raw bytes to a string. — Alex
– Alex, Commented May 16, 2013 at 15:49
@jtahlborn If I was maintaining string integrity I wouldn't be asking this question in the first place. It's not the keys giving me trouble here. Alex is correct, the string is always going to be alphanumeric. — elite5472
– elite5472, Commented May 16, 2013 at 15:53

jtahlborn · Accepted Answer · 2013-05-16 16:11:48Z

4

the problem is that you are converting your encrypted data to a String. encrypted data is binary, not String data. UTF-8 is a charset with a specific encoding format. arbitrary binary data is not valid UTF-8 data. when you convert the encrypted data into a String, the "invalid" characters are most likely getting replaced with the ? invalid char.

If you want to convert arbitrary binary data (aka encrypted data) into a String, you need to use some binary->text conversion like Base64.

answered May 16, 2013 at 16:11

jtahlborn

53.8k5 gold badges80 silver badges122 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

elite5472 Over a year ago

I added an example to your answer. Thanks for the help.

Alex · Accepted Answer · 2013-05-16 15:53:31Z

0

I would try out checking first that the output of your encrypt method matches the one you are expecting with a unit test.

Also it's a good idea to use Base64 after the encryption so you can convert it to a string.

Another common issue is converting int to bytes as if they were unsigned ints. Bytes range is -128 to 127.

edited May 16, 2013 at 15:53

answered May 16, 2013 at 15:37

Alex

1,1369 silver badges14 bronze badges

8 Comments

jtahlborn Over a year ago

the methods aren't recursive, they take Strings, they call methods taking byte[]s.

Alex Over a year ago

Oops you are right that's a strange use of polymorphism over there. I imagine that there are some duplicate code/checks on the other method.

elite5472 Over a year ago

It's not strange at all, they are both decrypt, but for different types of parameters. The string version prepares everything for the byte array method.

Alex Over a year ago

@elite5472 if(decoded_key.length != 16) should be checked on your byte accepting method. And I'm pretty sure you are already doing it in there as well.

elite5472 Over a year ago

@Alex this isn't production code. I'm still working on it. I needed to check if string encoding is working correctly before anything else.

|

Collectives™ on Stack Overflow

Encoding and decoding UTF-8 byte arrays from and to strings

2 Answers 2

1 Comment

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related