0

I have a type bytes file loaded in memory. How can I create identical files, as-if I was loading from my disk with open?

Consider the following:

type(downloaded_bytes)  # bytes
f = io.StringIO(downloaded_bytes.decode('utf-8')).readlines()
f2 = open(r"file.log", "r").readlines()
f == f2  # false

The large thing I noticed inspecting the files is that retrieving the file as bytes has replaced linebreaks. For example, in f2, a line reads like this:

'Initializing error logging... Success\n',

While in the bytes derived file, f, the same line reads:

'Initializing error logging... Success\r\n',

In other areas, \n (the expected line break), is replaced by \r in the bytes file.

How might I force f to be exactly like f2 here?

0

3 Answers 3

1

If you want to disable line ending translations, while still operating on str, the correct solution is to pass newline='' (or newline="") to open. It still decodes the input to str, and recognizes any form of line separator (\r\n, \n or \r) as a line break, but it doesn't normalize the line separator to a simple \n:

with open(r"file.log", newline='') as f2in:  # Demonstrating with with statement for guarantee close
    f2 = f2in.readlines()

Alternatively, to get rid of the \r in the downloaded bytes rather than preserving it in the file read from disk, the simplest solution is to just perform the line-ending translation yourself (adding import os to top of file if needed to get os.linesep definition):

f = io.StringIO(downloaded_bytes.decode('utf-8').replace(os.linesep, '\n')).readlines()
Sign up to request clarification or add additional context in comments.

Comments

0

You're running on Windows. Which, by convention, uses '\r\n' to terminate lines in "text mode". Open your file in "binary mode" instead:

f2 = open(r"file.log", "rb").readlines()

Note the trailing b in the second argument to open(). Then line-end translations won't happen.

3 Comments

Indeed, I edited my post but I want f to be exactly like f2
Unfortunately, this also disables the automatic decode from str to bytes as well. If you want to compare the raw bytes, this is a better approach, but if the goal is to convert to the same lines you'd see from the file, you'd still want str at the end.
I don't know - and can't guess - what the OP wants, exactly. They'll have to be more explicit, or settle for a variety of answers that may or may not address what they actually want ;-)
0

Well, don't use StringIO for binary stuff, use BytesIO!

from io import BytesIO

f = BytesIO(downloaded_bytes)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.