Difference between open and codecs.open in Python

Question

There are two ways to open a text file in Python:

f = open(filename)

And

import codecs
f = codecs.open(filename, encoding="utf-8")

When is codecs.open preferable to open?

Note that codecs.open() is obsolete in 3.x, since open() gains an encoding argument. — Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams, Commented Mar 9, 2011 at 19:05
There's also a 3rd way (in Python 2.x at least): `f = file(filename)' — Adam Parkin
– Adam Parkin, Commented Nov 1, 2012 at 15:53
@IgnacioVazquez-Abrams Is there any link that codecs.open() is obsolete? I don't think this in python3 docs: docs.python.org/3.7/library/codecs.html — varela
– varela, Commented Apr 17, 2019 at 12:25
@varela: the Python documentation page you mentioned says: "the builtin open() and the associated io module are the recommended approach for working with encoded text files" — Luciano Ramalho
– Luciano Ramalho, Commented May 10, 2019 at 2:10

3 revs · Accepted Answer · 2017-05-01 01:25:42Z

95

Since Python 2.6, a good practice is to use io.open(), which also takes an encoding argument, like the now obsolete codecs.open(). In Python 3, io.open is an alias for the open() built-in. So io.open() works in Python 2.6 and all later versions, including Python 3.4. See docs: http://docs.python.org/3.4/library/io.html

Now, for the original question: when reading text (including "plain text", HTML, XML and JSON) in Python 2 you should always use io.open() with an explicit encoding, or open() with an explicit encoding in Python 3. Doing so means you get correctly decoded Unicode, or get an error right off the bat, making it much easier to debug.

Pure ASCII "plain text" is a myth from the distant past. Proper English text uses curly quotes, em-dashes, bullets, € (euro signs) and even diaeresis (¨). Don't be naïve! (And let's not forget the Façade design pattern!)

Because pure ASCII is not a real option, open() without an explicit encoding is only useful to read binary files.

edited May 1, 2017 at 1:25

community wiki

3 revs
Luciano Ramalho

Sign up to request clarification or add additional context in comments.

4 Comments

Bdoserror Over a year ago

@ForeverWintr The answer is pretty clearly in there: use io.open() for text, and open() only for binary. The implication is that codecs.open() is not preferred at all.

ForeverWintr Over a year ago

@Bdoserror, There is an answer in there, clearly, but it's not an answer to the question that was asked. The question was about the difference between open and codecs.open, and specifically when the latter is preferable to the former. An answer that doesn't so much as mention codecs.open can't answer that question.

Bdoserror Over a year ago

@ForeverWintr If the the OP asked the wrong question (i.e. with the assumption that codecs.open() was correct to use) then there is no "correct" answer about when to use it. The answer is to use io.open() instead. It's like if I ask "when should I use a wrench to drive a nail into a wall?". The right answer is "use a hammer".

Marc Over a year ago

sometimes there is an unspoken question. In this one it is, "I don't know what codecs.open is for if open seems to do the same thing! What is it?" -- many thanks for explaining both!

Adam Parkin · Accepted Answer · 2012-11-01 16:14:34Z

23

Personally, I always use codecs.open unless there's a clear identified need to use open**. The reason is that there's been so many times when I've been bitten by having utf-8 input sneak into my programs. "Oh, I just know it'll always be ascii" tends to be an assumption that gets broken often.

Assuming 'utf-8' as the default encoding tends to be a safer default choice in my experience, since ASCII can be treated as UTF-8, but the converse is not true. And in those cases when I truly do know that the input is ASCII, then I still do codecs.open as I'm a firm believer in "explicit is better than implicit".

** - in Python 2.x, as the comment on the question states in Python 3 open replaces codecs.open

answered Nov 1, 2012 at 16:14

Adam Parkin

19k18 gold badges71 silver badges87 bronze badges

3 Comments

cblab Over a year ago

what I don't really get is why open sometimes can handle very well the UTF-8 encoded non-latin characters of the unicode set, and sometimes it fails miserabily ...

radtek Over a year ago

This makes sense to me. io.open does not take an encoding param from what I can see in python 2.7.5

jochietoch Over a year ago

@radtek, you are right that this is undocumented; however (at least in 2.7.12) io.open accepts encoding and newline parameters and interprets them as Python 3 does. Unlike codecs.open, a file opened with io.open will raise TypeError: write() argument 1 must be unicode, not str even in Python 2.7 if you attempt to write str (bytes) to it. A file opened with codecs.open will instead attempt implicit conversion to unicode, often leading to confusing UnicodeDecodeErrors.

Mandible79 · Accepted Answer · 2013-11-12 18:16:03Z

12

In Python 2 there are unicode strings and bytestrings. If you just use bytestrings, you can read/write to a file opened with open() just fine. After all, the strings are just bytes.

The problem comes when, say, you have a unicode string and you do the following:

>>> example = u'Μου αρέσει Ελληνικά'
>>> open('sample.txt', 'w').write(example)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

So here obviously you either explicitly encode your unicode string in utf-8 or you use codecs.open to do it for you transparently.

If you're only ever using bytestrings then no problems:

>>> example = 'Μου αρέσει Ελληνικά'
>>> open('sample.txt', 'w').write(example)
>>>

It gets more involved than this because when you concatenate a unicode and bytestring string with the + operator you get a unicode string. Easy to get bitten by that one.

Also codecs.open doesn't like bytestrings with non-ASCII chars being passed in:

codecs.open('test', 'w', encoding='utf-8').write('Μου αρέσει')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/codecs.py", line 691, in write
    return self.writer.write(data)
  File "/usr/lib/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128)

The advice about strings for input/ouput is normally "convert to unicode as early as possible and back to bytestrings as late as possible". Using codecs.open allows you to do the latter very easily.

Just be careful that you are giving it unicode strings and not bytestrings that may have non-ASCII characters.

answered Nov 12, 2013 at 18:16

Mandible79

3293 silver badges4 bronze badges

2 Comments

Chris Johnson Over a year ago

Can you explain your second example? It appears to be identical to your first example, so why would the result be any different?

Mandible79 Over a year ago

Note the use of the u'' in the first example. This means I created a unicode string, not a bytestring. This is the difference between the two examples. In the second example I am creating a bytestring and writing out one of those to a file is just fine. A unicode string is not fine if you're using characters outside of ASCII.

heretolearn · Accepted Answer · 2019-01-10 13:18:10Z

7

codecs.open, i suppose, is just a remnant from the Python 2 days when the built-in open had a much simpler interface and fewer capabilities. In Python 2, built-in open doesn't take an encoding argument, so if you want to use something other than binary mode or the default encoding, codecs.open was supposed to be used.

In Python 2.6, the io module came to the aid to make things a bit simpler. According to the official documentation

New in version 2.6.

The io module provides the Python interfaces to stream handling.
Under Python 2.x, this is proposed as an alternative to the
built-in file object, but in Python 3.x it is the default
interface to access files and streams.

Having said that, the only use i can think of codecs.open in the current scenario is for the backward compatibility. In all other scenarios (unless you are using Python < 2.6) it is preferable to use io.open. Also in Python 3.x io.open is the same as built-in open

Note:

There is a syntactical difference between codecs.open and io.open as well.

codecs.open:

open(filename, mode='rb', encoding=None, errors='strict', buffering=1)

io.open:

open(file, mode='r', buffering=-1, encoding=None,
     errors=None, newline=None, closefd=True, opener=None)

edited Jan 10, 2019 at 13:18

answered Dec 12, 2018 at 13:08

heretolearn

6,5655 gold badges32 silver badges55 bronze badges

1 Comment

wombatonfire Over a year ago

Not only codecs.open and io.open differ in terms of syntax, they return objects of different type. Also codecs.open always works with files in binary mode.

Geo · Accepted Answer · 2011-03-09 18:57:58Z

5

When you need to open a file that has a certain encoding, you would use the codecs module.

answered Mar 9, 2011 at 18:57

Geo

97.6k121 gold badges356 silver badges536 bronze badges

1 Comment

cblab Over a year ago

I guess all the text files have a certain encoding, somehow (:

wihlke · Accepted Answer · 2019-05-16 15:10:51Z

5

When you want to load a binary file, use f = io.open(filename, 'b').
For opening a text file, always use f = io.open(filename, encoding='utf-8') with explicit encoding.

In python 3 however open does the same thing as io.open and can be used instead.

Note: codecs.open is planned to become deprecated and replaced by io.open after its introduction in python 2.6. I would only use it if code needs to be compatible with earlier python versions. For more information on codecs and unicode in python see the Unicode HOWTO.

edited May 16, 2019 at 15:10

answered Sep 11, 2018 at 21:57

wihlke

3,0201 gold badge23 silver badges20 bronze badges

2 Comments

wombatonfire Over a year ago

1. Why can't I open a file in binary mode with io.open or codecs.open? 2. codecs.open is not deprecated yet, read the discussion on the page you linked to.

wihlke Over a year ago

Good points! 1. You can use either, but I would again advice against codecs.open unless you're on python 2.5 or older. 2. I updated my answer to reflect that the deprecation did not take place immediately, but rather in the future.

Cat Plus Plus · Accepted Answer · 2011-03-09 18:59:36Z

2

When you're working with text files and want transparent encoding and decoding into Unicode objects.

answered Mar 9, 2011 at 18:59

Cat Plus Plus

131k27 gold badges205 silver badges226 bronze badges

Collectives™ on Stack Overflow

Difference between open and codecs.open in Python

7 Answers 7

4 Comments

3 Comments

2 Comments

1 Comment

1 Comment

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

4 Comments

3 Comments

2 Comments

1 Comment

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related