18

I have a site that needs to encrypt and store binary files that are uploaded to the server. The uploading and storage works fine, but I'm getting this error when trying to write the encrypted file:

Encoding::UndefinedConversionError ("\xDD" from ASCII-8BIT to UTF-8):

The code that causes it looks like this:

fd_in = IO.sysopen(self[:name].tempfile.path, "rb")                           
file_in = IO.open(fd_in)                                                      
fd_out = IO.sysopen(self[:name].tempfile.path + ".encrypted", "wb")           
file_out = IO.open(fd_out)                                                    
cipher = OpenSSL::Cipher::Cipher.new('aes-256-cbc')                           
cipher.encrypt                                                                                                         
cipher.key = cipher_key                                                     
cipher.iv = cipher_iv                                                       
while chunk = file_in.read(1024)                                              
  file_out << cipher.update(chunk)                                            
end
file_out << cipher.final

The line that causes the error is the file_out << cipher.update(chunk) in the while loop. I've looked into this online and found some reports of similar ASCII/UTF conversion issues, but they all appear to be based on coercing string input, not stream file input. I'm using Ruby 1.9.2 which I believe affects default string encoding.

My rationale as to why (I think) I need to use a stream-based approach: the files tend to be large and I do not want to load the entire file (input or output) into memory to process it.

Any help is appreciated. Thanks.

2
  • I've found that using .force_encoding("UTF-8") on the #update and #final calls resolves the issue. If anyone can weigh in on whether this is actually the right way to do it and if (why?) UTF-8 is acceptable, I'd love to know. Commented Jul 25, 2011 at 2:24
  • For what it's worth, I also checked Encoding.default_external and Encoding.default_internal and both are UTF-8. Commented Jul 25, 2011 at 2:34

1 Answer 1

31

What you want to do when en-/decrypting is treat input and output as raw bytes, you want to avoid any transcoding caused by associating an encoding with your data at all cost. So you should open your files in binary mode, both for reading and for writing.

Actually you did this, but with IO#sysopen, but then you did not pass the "b" flags when using IO#open.

Your code should work if you rather try this:

fin = File.open("TODO", "rb")                           
fout = File.open("TODO.encrypted", "wb")           
cipher = OpenSSL::Cipher::Cipher.new('aes-256-cbc')
cipher.encrypt                     
cipher.key = key                                                     
cipher.iv = iv                                                       
while chunk = fin.read(1024)                                              
  fout << cipher.update(chunk)                                            
end
fout << cipher.final
fin.close
fout.close 
Sign up to request clarification or add additional context in comments.

1 Comment

Ah - I assumed (incorrectly) that IO#open would use the mode of the file descriptor from #sysopen. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.