0

I work with a strange csv file when i output a line with console.log it displayed properly:

console.log(typeof csv[1]);
// string
console.log(csv[1]);
// 510,,A_BC,4042594,...

But when i try to work with the string it wrap me incorrect "unicode" output on it, for example it output me "\u0000_":

console.log(csv[1].split(","));
// [
      '\u00005\u00001\u00000\u0000',
      '\u0000',
      '\u0000A\u0000_\u0000B\u000C\u0000'
      ...

console.log(csv[1].toString().substr(0, 4));
// 51

How can i work with a "proper" string ? Of course i can remove by hand all the \u0000 but i will prefer a "clean" solution

Please note that its node valid unicode, as \u0000_ simply not exists,

Its a pgp encoded csv, got the string with the following code:

var privateKey = await openpgp.decryptKey({
    privateKey: await openpgp.readPrivateKey({ binaryKey: fs.readFileSync(__dirname+"/../keys/xxxx.key")}),
    passphrase
});

fs.readFile(__dirname+"/../upload/"+file, async function(err,datas){

            if(err)
            {
                console.log(err);
                return ecb("[readFile]"+err);
            }
            else
            {
                const message = await openpgp.readMessage({
                    binaryMessage: datas // parse armored message
                });
                const { data: decrypted, signatures } = await openpgp.decrypt({
                    message,
                    decryptionKeys: privateKey
                });

                var csv = decrypted.toString().split("\n");

Thanks in advance

3
  • Please include the code that reads the csv. I guess the you have loaded the file with the wrong text encoding. Commented Mar 16, 2022 at 14:08
  • its a pgp encoded csv see edit Commented Mar 16, 2022 at 14:15
  • pgp will encrypt every text encoding. decrypted.toString() does the text decoding (UTF-8) but your CSV is not UTF-8. Commented Mar 16, 2022 at 14:21

1 Answer 1

1

You might be reading the csv with the wrong encoding so I advise you check that first.

The real issue with your string is that each of the characters are separated by null bytes.

You can remove them by iterating through the characters in the string and removing the ones that have a unicode value of 0.

function removeNullBytes(str){
  return str.split("").filter(char => !!char.codePointAt(0)).join("")
}

const example = '\u00005\u00001\u00000\u0000';

console.log(removeNullBytes(example))

Sign up to request clarification or add additional context in comments.

3 Comments

It's not good, if you see the output the '\u00005\u00001\u00000\u0000' is infact the string 510
@Daphoque I just realized why. Your string is actually separated by null bytes
@Daphoque just updated, check if it works for you now

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.