nodejs strange unicode output from string

Question

I work with a strange csv file when i output a line with console.log it displayed properly:

console.log(typeof csv[1]);
// string
console.log(csv[1]);
// 510,,A_BC,4042594,...

But when i try to work with the string it wrap me incorrect "unicode" output on it, for example it output me "\u0000_":

console.log(csv[1].split(","));
// [
      '\u00005\u00001\u00000\u0000',
      '\u0000',
      '\u0000A\u0000_\u0000B\u000C\u0000'
      ...

console.log(csv[1].toString().substr(0, 4));
// 51

How can i work with a "proper" string ? Of course i can remove by hand all the \u0000 but i will prefer a "clean" solution

Please note that its node valid unicode, as \u0000_ simply not exists,

Its a pgp encoded csv, got the string with the following code:

var privateKey = await openpgp.decryptKey({
    privateKey: await openpgp.readPrivateKey({ binaryKey: fs.readFileSync(__dirname+"/../keys/xxxx.key")}),
    passphrase
});

fs.readFile(__dirname+"/../upload/"+file, async function(err,datas){

            if(err)
            {
                console.log(err);
                return ecb("[readFile]"+err);
            }
            else
            {
                const message = await openpgp.readMessage({
                    binaryMessage: datas // parse armored message
                });
                const { data: decrypted, signatures } = await openpgp.decrypt({
                    message,
                    decryptionKeys: privateKey
                });

                var csv = decrypted.toString().split("\n");

Thanks in advance

Please include the code that reads the csv. I guess the you have loaded the file with the wrong text encoding. — windm
– windm, Commented Mar 16, 2022 at 14:08
pgp will encrypt every text encoding. decrypted.toString() does the text decoding (UTF-8) but your CSV is not UTF-8. — windm
– windm, Commented Mar 16, 2022 at 14:21

Abir Taheer · Accepted Answer · 2022-03-16 14:41:20Z

1

You might be reading the csv with the wrong encoding so I advise you check that first.

The real issue with your string is that each of the characters are separated by null bytes.

You can remove them by iterating through the characters in the string and removing the ones that have a unicode value of 0.

function removeNullBytes(str){
  return str.split("").filter(char => !!char.codePointAt(0)).join("")
}

const example = '\u00005\u00001\u00000\u0000';

console.log(removeNullBytes(example))

edited Mar 16, 2022 at 14:41

answered Mar 16, 2022 at 14:11

Abir Taheer

2,8533 gold badges16 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Daphoque Over a year ago

It's not good, if you see the output the '\u00005\u00001\u00000\u0000' is infact the string 510

Abir Taheer Over a year ago

@Daphoque I just realized why. Your string is actually separated by null bytes

Abir Taheer Over a year ago

@Daphoque just updated, check if it works for you now

Collectives™ on Stack Overflow

nodejs strange unicode output from string

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related