1

i am doing a project which requires me to convert UTF-8 string stored in a windows text file into a continuous binary string and store it in a windows text file. and then read this binary string and convert it back to the original UTF-8 String and store it in a text file. i converted the UTF-8 string to Binarystring but have no idea how to reverse the process.

here's my program to convert UTF-8 String to Binary strings.

package filetobits;

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;

public class FileToBits {

    public static void main(String[] args) throws IOException, FileNotFoundException {

        FileReader inputStream = new FileReader("C:\\FileTesting\\UTF8.txt");
        FileWriter outputStream = new FileWriter("C:\\FileTesting\\BinaryStrings.txt");

        int c;

        while ((c = inputStream.read()) != -1) {

            outputStream.write(Integer.toBinaryString(c));
            outputStream.write(System.lineSeparator());
        }
        inputStream.close();
        outputStream.close();
    }
}

here's my input(16 characters):

¼¹¨'I.p

here's my output:

1111111111111101 10000 1111111111111101 11111 1111111111111101 100111 1001001 101110 1110000 111100 1111111111111101 1100001 101100 101001 1111111111111101 1111111111111101

i need help converting these binary strings back to a single UTF-8 String and store it in a text file.

i achieved what i want with the following code:

    String str = "";
    FileReader inputStream = new FileReader("C:\\FileTesting\\Encrypted.txt");
    FileWriter outputStream = new FileWriter("C:\\FileTesting\\EncryptedBin.txt");
int c;
while ((c  = inputStream.read()) != -1) {
String s = String.format("%16s", Integer.toBinaryString(c)).replace(' ', '0');
for (int i = 0; i < s.length() / 16; i++) {
int a = Integer.parseInt(s.substring(16 * i, (i + 1) * 16), 2);
str += (char) (a);
    }
   }

But the problem is i cant add extra 0's to make every binary string to a length of 16, because i need to store this data in a image(for my image steganography project). so the shorter the binary string the better.

i need to get the same output produced by the above code but without converting every binary string to a length of 16.

PS: i am kinda lost when it comes to character encodings. is storing UTF-8 characters in a windows txt file convert them to ANSI or something?

7
  • I think this is what you needed docs.oracle.com/javase/7/docs/api/java/lang/… and in reverse process you can simply create a new String instance from bytes. Commented Jan 28, 2017 at 12:43
  • Possible duplicate of How to parse String as Binary and convert it to UTF-8 equivalent in Java? Commented Jan 28, 2017 at 12:45
  • @SabirKhan i'm sorry but it's not a duplicate, i can't afford to write extra bits to make every character 16 bits binary value. Commented Jan 28, 2017 at 12:51
  • Exactly where did you see that UTF-8 requires 16 bits per character? The number of bytes required per character varies and depending what these byte values are, the decoder knows how many bytes to read to recreate a character. By the way, are you ACTUALLY required to write the binary string of your UTF-8 string to a file, or is this an intermediate step of your own to make sure the conversion is correct before you embed the bits to your image? Because in that case, you don't even need a bit string, just the bytes array. Commented Jan 28, 2017 at 13:04
  • @Reti43 you're right. i don't need to convert it into binary string. i just don't know how to embed these bytes directly into LSB's of pixels. so i thought it would be easier to first convert it into a string of 0's and 1's then embed bit by bit. Commented Jan 28, 2017 at 13:09

1 Answer 1

0

a byte has 8 bits. in a first step, ignore the UTF-8 issue, just fill a byte[] with the data from your binary string.

When you have a byte[] data, you can use new String(d) to create an UTF-8 String (Java Strings are UTF-8 be default).

Sign up to request clarification or add additional context in comments.

3 Comments

btw - when writing a file in java, it's by default UTF-8. You can specify alternative output by using a charset when opening the FileOutputStream or whatelse (Charset.forName("iso-8859-1") or sth like this)
but doesn't the UTF-8 String contain 16bit charcters? how can i store it in a byte(8-bits) array?
UTF-8 may store characters as 1-byte, 2-byte, and even up to 4-byte sequences. The 1-byte charset is compatible with 7bit ASCII. There are differences only in the 8bit ASCII (so ASCII 128..256). When storing a 1 byte UTF-8 char in a byte[] it will use only one element of the array. When storing a multibyte char in a byte[] you will see that several bytes of the byte[] are used to store this single character.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.