17

I'm using org.apache.commons.codec.binary.Base64 do decode string which is utf8. Sometimes I get base64 encoded string which after decode looks like for example ^@k��@@. How can I check if base64 is correct or if decoded utf8 string is valid utf8 string?

To clarify. I'm using

public static String base64Decode(String str) {
    try {
        return new String(base64Decode(str.getBytes(Constants.UTF_8)), Constants.UTF_8);
    } catch (UnsupportedEncodingException e) {
         ...
    }
}

public static byte[] base64Decode(byte[] byteArray) {
    return Base64.decodeBase64(byteArray);
}
4
  • What do you mean be a String is "UTF-8"? A String object doesn't know about encodings and charsets. Commented Jan 17, 2011 at 17:46
  • 1
    @Michael Konietzka: I think that is unnecessary nitpicking. Base64 encodes a sequence of bytes. I think the OP is clearly saying that the byte sequence is assumed to be the UTF-8 encoding of a unicode string not that a java.lang.String is directly encoded as Base64 (which as you say would not make sense.) Commented Jan 17, 2011 at 18:33
  • @finnw sorry I dont know how to explain clearly. I get encoded string using base64 and I want to check if it is correct. I want to catch situation when I get base64 encoded string which after decoding looks like trash, because everything I received should be some for example name. Commented Jan 18, 2011 at 7:56
  • Maybe I just have to check is base64 dont contain any space and other dont allowed chars? Commented Jan 18, 2011 at 10:22

3 Answers 3

32

You should specify the charset during converting String to byte[] and vice versa.

byte[] bytes = string.getBytes("UTF-8");
// feed bytes to Base64

and

// get bytes from Base64
String string = new String(bytes, "UTF-8");

Otherwise the platform default encoding will be used which is not necessarily UTF-8 per se.

Sign up to request clarification or add additional context in comments.

6 Comments

That string does not look like UTF8 misinterpreted as a single-byte encoding. Could it be GB18030 misinterpreted as UTF8?
@finnw: The answer indeed assumes that the original string is UTF-8, as explicitly mentioned by the OP. If this is actually not the case, then the problem is to be solved somewhere else.
@BalusC: What do you mean by a String is UTF8? UTF-8 is an encoding.
@Michael: the string must have been constructed somehow. For example, if you create the string based on data returned by a Reader, you need to ensure as well that the Reader is reading the source using UTF-8. I however understand your nitpick, I should probably have worded my previous comment better, e.g. "source" instead of "string".
i.e "国家标准" is neither UTF-8 nor GB18030, it is just a String object. But it can be encoded with UTF-8, GB18030, because these encodings can encode all unicode code points. Of course, the decoding system must use the same character encoding on the bytes as the encoding system. Yes, I am nit-pick on this issue, because in the question "a string is utf-8" was mentioned, which needs clarification, because there is no such thing as a "UTF-8 String". You can encode a String into a byte array using UTF-8, but then there is just a byte[].
|
1

Try this:

var B64 = {
    alphabet: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=',
    lookup: null,
    ie: /MSIE /.test(navigator.userAgent),
    ieo: /MSIE [67]/.test(navigator.userAgent),
    encode: function (s) {
        var buffer = B64.toUtf8(s),
            position = -1,
            len = buffer.length,
            nan1, nan2, enc = [, , , ];
        if (B64.ie) {
            var result = [];
            while (++position < len) {
                nan1 = buffer[position + 1], nan2 = buffer[position + 2];
                enc[0] = buffer[position] >> 2;
                enc[1] = ((buffer[position] & 3) << 4) | (buffer[++position] >> 4);
                if (isNaN(nan1)) enc[2] = enc[3] = 64;
                else {
                    enc[2] = ((buffer[position] & 15) << 2) | (buffer[++position] >> 6);
                    enc[3] = (isNaN(nan2)) ? 64 : buffer[position] & 63;
                }
                result.push(B64.alphabet[enc[0]], B64.alphabet[enc[1]], B64.alphabet[enc[2]], B64.alphabet[enc[3]]);
            }
            return result.join('');
        } else {
            result = '';
            while (++position < len) {
                nan1 = buffer[position + 1], nan2 = buffer[position + 2];
                enc[0] = buffer[position] >> 2;
                enc[1] = ((buffer[position] & 3) << 4) | (buffer[++position] >> 4);
                if (isNaN(nan1)) enc[2] = enc[3] = 64;
                else {
                    enc[2] = ((buffer[position] & 15) << 2) | (buffer[++position] >> 6);
                    enc[3] = (isNaN(nan2)) ? 64 : buffer[position] & 63;
                }
                result += B64.alphabet[enc[0]] + B64.alphabet[enc[1]] + B64.alphabet[enc[2]] + B64.alphabet[enc[3]];
            }
            return result;
        }
    },
    decode: function (s) {
        var buffer = B64.fromUtf8(s),
            position = 0,
            len = buffer.length;
        if (B64.ieo) {
            result = [];
            while (position < len) {
                if (buffer[position] < 128) result.push(String.fromCharCode(buffer[position++]));
                else if (buffer[position] > 191 && buffer[position] < 224) result.push(String.fromCharCode(((buffer[position++] & 31) << 6) | (buffer[position++] & 63)));
                else result.push(String.fromCharCode(((buffer[position++] & 15) << 12) | ((buffer[position++] & 63) << 6) | (buffer[position++] & 63)));
            }
            return result.join('');
        } else {
            result = '';
            while (position < len) {
                if (buffer[position] < 128) result += String.fromCharCode(buffer[position++]);
                else if (buffer[position] > 191 && buffer[position] < 224) result += String.fromCharCode(((buffer[position++] & 31) << 6) | (buffer[position++] & 63));
                else result += String.fromCharCode(((buffer[position++] & 15) << 12) | ((buffer[position++] & 63) << 6) | (buffer[position++] & 63));
            }
            return result;
        }
    },
    toUtf8: function (s) {
        var position = -1,
            len = s.length,
            chr, buffer = [];
        if (/^[\x00-\x7f]*$/.test(s)) while (++position < len)
        buffer.push(s.charCodeAt(position));
        else while (++position < len) {
            chr = s.charCodeAt(position);
            if (chr < 128) buffer.push(chr);
            else if (chr < 2048) buffer.push((chr >> 6) | 192, (chr & 63) | 128);
            else buffer.push((chr >> 12) | 224, ((chr >> 6) & 63) | 128, (chr & 63) | 128);
        }
        return buffer;
    },
    fromUtf8: function (s) {
        var position = -1,
            len, buffer = [],
            enc = [, , , ];
        if (!B64.lookup) {
            len = B64.alphabet.length;
            B64.lookup = {};
            while (++position < len)
            B64.lookup[B64.alphabet[position]] = position;
            position = -1;
        }
        len = s.length;
        while (position < len) {
            enc[0] = B64.lookup[s.charAt(++position)];
            enc[1] = B64.lookup[s.charAt(++position)];
            buffer.push((enc[0] << 2) | (enc[1] >> 4));
            enc[2] = B64.lookup[s.charAt(++position)];
            if (enc[2] == 64) break;
            buffer.push(((enc[1] & 15) << 4) | (enc[2] >> 2));
            enc[3] = B64.lookup[s.charAt(++position)];
            if (enc[3] == 64) break;
            buffer.push(((enc[2] & 3) << 6) | enc[3]);
        }
        return buffer;
    }
};

View Here

1 Comment

This one worked perfectly for me. I understand it got a negative vote because its a javascript answer on a java question.
0

I created this method:

public static String descodificarDeBase64(String stringCondificado){
    try {
        return new String(Base64.decode(stringCondificado.getBytes("UTF-8"),Base64.DEFAULT));
    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
        return "";
    }
}

So I can decode from Base64 spanish characthers as á,ñ,í,ü.

Example:

descodificarDeBase64("wr9xdcOpIHRhbD8=");

will return: ¿Qué tal?

1 Comment

Base64.DEFAULT is undefined

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.