8

I have an ArrayBuffer which is returned by reading memory using Frida. I'm converting the ArrayBuffer to a string, then back to an ArrayBuffer using TextDecoder and TextEncoder, however the result is being altered in the process. The ArrayBuffer length after decoding and re-encoding always comes out larger. Is there a character decoding in an expansive fashion?

How can I decode an ArrayBuffer to a String, then back to an ArrayBuffer without losing integrity?

Example code:

var arrayBuff = Memory.readByteArray(pointer,2000); //Get a 2,000 byte ArrayBuffer

console.log(arrayBuff.byteLength); //Always returns 2,000

var textDecoder = new TextDecoder("utf-8");
var textEncoder = new TextEncoder("utf-8");

//Decode and encode same data without making any changes
var decoded = textDecoder.decode(arrayBuff);
var encoded = textEncoder.encode(decoded);

console.log(encoded.byteLength); //Fluctuates between but always greater than 2,000
2
  • Are you trying to encode an arbitrary byte array as text then convert it back to the same byte sequence... Or does the byte array contain legal utf8 encoded text? If it's the former, you simply can't do this. Commented May 6, 2018 at 9:08
  • I would like it to be the same byte sequence. My end goal is to replace a value inside the ArrayBuffer. So if it contains the string "1234", I want to make it "1111" and create a new ArrayBuffer so I can replace it in memory. I was doing this as a cursory test and noticed it would never work because of the sizing mismatch. Commented May 6, 2018 at 9:11

1 Answer 1

13

TextDecoder and TextEncoder are designed to work with text. To convert an arbitrary byte sequence into a string and back, it's best to treat each byte as a single character.

var arrayBuff = Memory.readByteArray(pointer,2000); //Get a 2,000 byte ArrayBuffer

console.log(arrayBuff.byteLength); //Always returns 2,000

//Decode and encode same data without making any changes
var decoded = String.fromCharCode(...new Uint8Array(arrayBuff));
var encoded = Uint8Array.from([...decoded].map(ch => ch.charCodeAt())).buffer;

console.log(encoded.byteLength);

The decoded string will have exactly the same length as the input buffer and can be easily manipulated with regular expression, string methods, etc. But beware that Unicode characters that occupy two or more bytes in memory (e.g. "π") won't be recognizable anymore, as they will result in the concatenation of the characters corresponding to the code of each individual byte.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. I had to alter this a bit since Frida doesn't allow using the "from" method, but you can just do it through a loop and it will work. The in-memory replacement I was attempting works as I intended now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.