Converting ArrayBuffer to String then back to ArrayBuffer using TextDecoder/TextEncoder returning a different result

Question

I have an ArrayBuffer which is returned by reading memory using Frida. I'm converting the ArrayBuffer to a string, then back to an ArrayBuffer using TextDecoder and TextEncoder, however the result is being altered in the process. The ArrayBuffer length after decoding and re-encoding always comes out larger. Is there a character decoding in an expansive fashion?

How can I decode an ArrayBuffer to a String, then back to an ArrayBuffer without losing integrity?

Example code:

var arrayBuff = Memory.readByteArray(pointer,2000); //Get a 2,000 byte ArrayBuffer

console.log(arrayBuff.byteLength); //Always returns 2,000

var textDecoder = new TextDecoder("utf-8");
var textEncoder = new TextEncoder("utf-8");

//Decode and encode same data without making any changes
var decoded = textDecoder.decode(arrayBuff);
var encoded = textEncoder.encode(decoded);

console.log(encoded.byteLength); //Fluctuates between but always greater than 2,000

Are you trying to encode an arbitrary byte array as text then convert it back to the same byte sequence... Or does the byte array contain legal utf8 encoded text? If it's the former, you simply can't do this. — spender
– spender, Commented May 6, 2018 at 9:08
I would like it to be the same byte sequence. My end goal is to replace a value inside the ArrayBuffer. So if it contains the string "1234", I want to make it "1111" and create a new ArrayBuffer so I can replace it in memory. I was doing this as a cursory test and noticed it would never work because of the sizing mismatch. — Hem
– Hem, Commented May 6, 2018 at 9:11

GOTO 0 · Accepted Answer · 2018-05-06 09:40:43Z

13

TextDecoder and TextEncoder are designed to work with text. To convert an arbitrary byte sequence into a string and back, it's best to treat each byte as a single character.

var arrayBuff = Memory.readByteArray(pointer,2000); //Get a 2,000 byte ArrayBuffer

console.log(arrayBuff.byteLength); //Always returns 2,000

//Decode and encode same data without making any changes
var decoded = String.fromCharCode(...new Uint8Array(arrayBuff));
var encoded = Uint8Array.from([...decoded].map(ch => ch.charCodeAt())).buffer;

console.log(encoded.byteLength);

The decoded string will have exactly the same length as the input buffer and can be easily manipulated with regular expression, string methods, etc. But beware that Unicode characters that occupy two or more bytes in memory (e.g. "π") won't be recognizable anymore, as they will result in the concatenation of the characters corresponding to the code of each individual byte.

answered May 6, 2018 at 9:40

GOTO 0

48.8k25 gold badges139 silver badges165 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Hem Over a year ago

Thanks. I had to alter this a bit since Frida doesn't allow using the "from" method, but you can just do it through a loop and it will work. The in-memory replacement I was attempting works as I intended now.

Collectives™ on Stack Overflow

Converting ArrayBuffer to String then back to ArrayBuffer using TextDecoder/TextEncoder returning a different result

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related