Offset when reading binary file in python

Question

I have an OSM PBF file which I am trying to parse. The format standard states, and reading it in Sublime Text this is confirmed, that the first four bytes are:

0000 000d

Why then, if I run a very simple Python program:

PBFfile = open(r'MyFilePath.osm.pbf')
PBFfile.read(4)[3].encode('hex')

does it return 0a (the next byte in the sequence) not the expected 0d? Is there an obvious explanation?

I am on Windows 7, Python 2.7.5 32 bit.

On Windows '\r' is stripped from text file records. Open the file in binary mode open(filename, 'rb') — cdarke
– cdarke, Commented Feb 20, 2015 at 11:04

shx2 · Accepted Answer · 2015-02-20 11:03:08Z

2

You're opening the file in "text mode", which causes some unwanted newline handling (docs).

To solve your problem, open it in binary mode, like:

PBFfile = open(r'MyFilePath.osm.pbf', 'rb')

answered Feb 20, 2015 at 11:03

shx2

64.8k17 gold badges139 silver badges166 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Stev_k Over a year ago

perfect, thank you. I never realised it was so important. Guess it's a Windows thing :(

Collectives™ on Stack Overflow

Offset when reading binary file in python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related