Encoding String values with base64

Question

so I was working on a homework assignment for my CSC420 class. Professor wanted us to use java code to encrypt two string values that the user would enter. I was able to do, no real issue there; the main problem is that the sample output that he gave to us so that we would know if we got the "right answer" is somehow different then mine. I have attached my code below, my output, and his output; if someone could tell me what I am doing wrong, that would be greatly appreciated.

package Homework;

import static java.nio.charset.StandardCharsets.UTF_8; 
import java.util.Base64; 
import java.util.Base64.Encoder; 
import java.util.Scanner; 

public class HW4 { 
  public static String b64enc(String string) throws Exception { 

    Encoder encoder = Base64.getEncoder(); 
    byte[] data = string.getBytes(UTF_8); 
    String encodedString = encoder.encodeToString(data); 

return encodedString; 
} 

public static void main (String [] args) throws Exception { 

    Scanner scan1 = new Scanner(System.in);
      System.out.println("Please enter the first String: ");
      String string1 = scan1.nextLine();
      System.out.println("Please enter the second string: ");
      String string2= scan1.nextLine();
      scan1.close();  
      String encodedString = b64enc(string1 + string2);
      System.out.println(encodedString);
    }
}

Text[![] ]1

First, I was off with my first statement. Second, I tried encoding hellohi as a direct input using your code and it works fine. I then added the Scanner input and that works fine as well. I'm wondering if it might be an encoding issue at the OS level — MadProgrammer
– MadProgrammer, Commented Mar 3, 2020 at 21:54
How can hello+hi and hi+hello in the second screenshot produce the same base64 output? Did you ever check the base64 codes on base64decode.org ? — jps
– jps, Commented Mar 3, 2020 at 21:58
If you go the the https://www.base64decode.org and enter your base64 encoded string there you'll see, that your encoding works correctly. — Sergej Masljukow
– Sergej Masljukow, Commented Mar 3, 2020 at 21:59
@MadProgrammer Is there really anything I can do at this point or should I just turn it in and see what the prof. has to say? — anon
– anon, Commented Mar 3, 2020 at 21:59
@Zethos, try emailing the TA/Professor. The second screenshot is definitely wrong. — Harshal Parekh
– Harshal Parekh, Commented Mar 3, 2020 at 22:01

rzwitserloot · Accepted Answer · 2020-03-03 21:59:24Z

2

The fact that your prof's program has the same control value for 'hihello' and 'hellohi' is special; obviously, just Base64-encoding a string (which doesn't delete information; you can get back to the original with it; that is the point) implies it is impossible for 2 different inputs to generate the same output.

I conclude that you must not have read the instructions correctly. You're looking for an algorithm where entering the strings in a different order nevertheless produces the same 'encoded value' is explained. 'concatenate them and then base64 the result' wouldn't.

answered Mar 3, 2020 at 21:59

rzwitserloot

107k6 gold badges74 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

anon Over a year ago

I believe that I read the assignment properly; all he stated was to create a code that would decrypt two string values that were given. I wil email him today to see if maybe what I have is truely fine or if I did, infact, misconstrue something he requested.

anon Over a year ago

Yeah, If memory serves correctly, base64 is just the 64 characters from ASCII that can implemented in plaintext; right?

rzwitserloot Over a year ago

Yes, it maps 3 bytes (each byte can be any of 256 different options, for a total of 256*256*256 = 16777216 different options) onto 4 characters picked from a set of 64 options (a-z,A-Z,0-9 and 2 others, which ones depend a bit on which flavour of base64): 64*64*64*64 = 16777216. Which is why that works.

user85421 · Accepted Answer · 2020-03-04 09:23:57Z

I think I do understand the assignment now.

The code is supposed to encode one message string using another string as key. Base64 is used only for encoding the result since it is binary data and can (will) contain codes that are not printable - so the result is represented as text and, for example, can be mailed to the teacher.

First we note that the order of the strings does not matter, so there is no real distinction which is key, which is message.

Next we can decode the example results, for example using linux command base64 (I used GIT Bash, but there are also online services available for this). I also piped the result to od (hex dump utility to see hexadecimal values):

$ echo "AAxMTE8=" | base64 -d | od -t x1 -c

which returns

0000000  00  0c  4c  4c  4f
         \0  \f   L   L   O

Note that it is 5 bytes long, the same as the longer of the input string - so we can assume that the strings are not being concatenated, which would change the length, but that the bytes of each string are being combined in some way. Also that each character is using one byte, so encoding probably is UTF-8 or even ASCII.

Further we see that the result ends with "LLO", the uppercase version of the end of the longest input "hello" - looks like the position of the bytes were not changed, just the values combined by some operation. Lets consider some operations that can be used to combine the bytes:

Subtraction or division: won't work since order of input does not matter;
Addition or multiplication: not good because of possible overflow/underflow
Bitwise AND, NAND, OR or NOR: won't work, information loss (e.g. x AND 0 is always 0)
Bitwise XOR (exclusive OR): (almost) perfect, easy to encrypt, easy to decrypt, order does not matter (but not very strong)

Lets check what happens with XOR:

input1: "hello" == [ 0x68, 0x65, 0x6c, 0x6c, 0x6f ] // using ASCII/UTF-8
input2: "hi"    == [ 0x68, 0x69 ]                   // "
result: "∅∅LLO" == [ 0x00, 0x0c, 0x4c, 0x4c, 0x4f ] // ∅ not printable

0x68 ^ 0x68 == 0x00  // correct!
0x65 ^ 0x69 == 0x0f  // "
0x6c ^  X   == 0x4c  // what is X?
0x6f ^  X   == 0x4f  // "

Now we just need to see what should happen with the last 3 bytes, one input is too short, that is, what is X. It is not hard find out that 0x6c ^ 0x20 == 0x4c and 0x6f ^ 0x20 == 0x4f, actually A ^ X == B implies that A ^ B == X. So we conclude that the smaller string must be filled up with 0x20 or the white-space character ' '.

The algorithm must be something like: make both input strings the same size by appending white-spaces (' ') to the smaller string. Convert both inputs to byte array. Combine the bytes of each array using exclusive OR. Encode the result using Base64.

amazing, so the whole Q/A here boils down to This is my professors answer, can someone explain what his question was? ;-)
@jps Sorta I guess yeah, my main thing was wondering why my professor got a different output from me; I am new to the type of coding, so I wanted to make sure I wasn't screwing up somehow.

Collectives™ on Stack Overflow

Encoding String values with base64

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related