Skip to main content

Python cross-version byte-code decompiler

Project description

buildstatus Pypi Installs Latest Version Supported Python Versions

packagestatus

uncompyle6

A native Python cross-version decompiler and fragment decompiler. The successor to decompyle, uncompyle, and uncompyle2.

I gave a talk on this at BlackHat Asia 2024.

Introduction

uncompyle6 translates Python bytecode back into equivalent Python source code. It accepts bytecodes from Python version 1.0 to version 3.8, spanning over 24 years of Python releases. We include Dropbox’s Python 2.5 bytecode and some PyPy bytecodes.

Why this?

Ok, I’ll say it: this software is amazing. It is more than your normal hacky decompiler. Using compiler technology, the program creates a parse tree of the program from the instructions; nodes at the upper levels that look a little like what might come from a Python AST. So we can really classify and understand what’s going on in sections of Python bytecode.

Building on this, another thing that makes this different from other CPython bytecode decompilers can deparse just fragments of source code and give source-code information around a given bytecode offset.

I use the tree fragments to deparse fragments of code at run time inside my trepan debuggers. For that, bytecode offsets are recorded and associated with fragments of the source code. This purpose, although compatible with the original intention, is yet a little bit different. See this for more information.

Python fragment deparsing, given an instruction offset, is useful in showing stack traces and can be incorporated into any program that wants to show a location in more detail than just a line number at runtime. This code can also be used when source code information does not exist and there is just bytecode. Again, my debuggers make use of this.

There were (and still are) several decompyle, uncompyle, uncompyle2, uncompyle3 forks around. Many of them come basically from the same code base, and (almost?) all of them are no longer actively maintained. One was really good at decompiling Python 1.5-2.3, another is really good at Python 2.7, but only that. Another handles Python 3.2 only; another patched that and handled only 3.3. You get the idea. This code pulls all of these forks together and moves forward. There is some serious refactoring and cleanup in this code base over those old forks. Even more experimental refactoring is going on in decompyle3.

This demonstrably does the best in decompiling Python across all Python versions. And even when there is another project that only provides decompilation for a subset of Python versions, we generally do demonstrably better for those as well.

How can we tell? By taking Python bytecode that comes distributed with that version of Python and decompiling it. Among those that successfully decompile, we can then make sure the resulting programs are syntactically correct by running the Python interpreter for that bytecode version. Finally, in cases where the program has a test for itself, we can run the check on the decompiled code.

We use automated processes to find bugs. In the issue trackers for other decompilers, you will find several bugs we’ve found along the way. Very few of them are fixed in the other decompilers.

Requirements

The code in the git repository can be run from Python 2.4 to the latest Python version, except Python 3.0 through 3.2. Volunteers are welcome to address these deficiencies if there is a desire to do so.

The way it does this, though, is by segregating consecutive Python versions into git branches:

master

Python 3.11 and up

python-3.6-to-3.10

Python 3.6 to python-3.10 (uses type annotations)

python-3.3-to-3.5

Python 3.3 through 3.5 (Generic Python 3)

python-2.4

Python 2.4 through 2.7 (Generic Python 2)

PyPy 3-2.4 and later works as well.

The bytecode files it can read have been tested on Python bytecodes from versions 1.4, 2.1-2.7, and 3.0-3.8 and later PyPy versions.

Installation

You can install from PyPI using the name uncompyle6:

pip install uncompyle6

To install from source code, this project uses setup.py, so it follows the standard Python routine:

$ pip install -e .  # set up to run from source tree

or:

$ python setup.py install # may need sudo

A GNU Makefile is also provided, so make install (possibly as root or sudo) will do the steps above.

Running Tests

make check

A GNU makefile has been added to smooth over setting up and running the right command, and running tests from fastest to slowest.

If you have remake installed, you can see the list of all tasks including tests via remake --tasks

Usage

Run

$ uncompyle6 *compiled-python-file-pyc-or-pyo*

For usage help:

$ uncompyle6 -h

Verification

In older versions of Python, it was possible to verify bytecode by decompiling it and then compiling using the Python interpreter for that bytecode version. Having done this, the bytecode produced could be compared with the original bytecode. However, as Python’s code generation got better, this was no longer feasible.

If you want Python syntax verification of the correctness of the decompilation process, add the --syntax-verify option. However since Python syntax changes. You should use this option if the bytecode is the right bytecode for the Python interpreter that will be checking the syntax.

You can also cross-compare the results with another version of uncompyle6 since there are sometimes regressions in decompiling specific bytecode, as the overall quality improves.

For Python 3.7 and 3.8, the code in decompyle3 is generally better.

Or try another specific Python decompiler like uncompyle2, unpyc37, or pycdc. Since the latter two work differently, bugs here often aren’t in that, and vice versa.

There is an interesting class of these programs that is readily available to give stronger verification: those programs that, when run, test themselves. Our test suite includes these.

And Python comes with another set of programs like this: its test suite for the standard library. We have some code in test/stdlib to facilitate this kind of checking too.

Known Bugs/Restrictions

The biggest known and possibly fixable (but hard) problem has to do with handling control flow. (Python has probably the most diverse and screwy set of compound statements I’ve ever seen; there are “else” clauses on loops and try blocks that I suspect many programmers don’t know about.)

All of the Python decompilers that I have looked at have problems decompiling Python’s control flow. In some cases, we can detect an erroneous decompilation and report that.

Python support is pretty good for Python 2

On the lower end of Python versions, decompilation seems pretty good, although we don’t have any automated testing in place for Python’s distributed tests. Also, we don’t have a Python interpreter for versions 1.6 and 2.0.

In the Python 3 series, Python support is strongest around 3.4 or 3.3 and drops off as you move further away from those versions. Python 3.0 is weird in that it, in some ways, resembles 2.6 more than it does 3.1 or 2.7. Python 3.6 changes things drastically by using word codes rather than byte codes. As a result, the jump offset field in a jump instruction argument has been reduced. This makes the EXTENDED_ARG instructions now more prevalent in jump instructions; previously they had been rare. Perhaps to compensate for the additional EXTENDED_ARG instructions, additional jump optimization has been added. So in sum handling control flow by ad hoc means, as is currently done is worse.

Between Python 3.5, 3.6, 3.7, there have been major changes to the MAKE_FUNCTION and CALL_FUNCTION instructions.

Python 3.8 removes SETUP_LOOP, SETUP_EXCEPT, BREAK_LOOP, and CONTINUE_LOOP, instructions which may make control-flow detection harder, lacking the more sophisticated control-flow analysis that is planned. We’ll see.

Currently, not all Python magic numbers are supported. Specifically in some versions of Python, notably Python 3.6, the magic number has changes several times within a version.

We support only released versions, not candidate versions. Note, however, that the magic of a released version is usually the same as the last candidate version prior to release.

There are also customized Python interpreters, notably Dropbox, which use their own magic and encrypt bytecode. With the exception of Dropbox’s old Python 2.5 interpreter, this kind of thing is not handled.

We also don’t handle PJOrion or otherwise obfuscated code. For PJOrion try: PJOrion Deobfuscator to unscramble the bytecode to get valid bytecode before trying this tool; pydecipher might help with that.

This program can’t decompile Microsoft Windows EXE files created by Py2EXE, although we can probably decompile the code after you extract the bytecode properly. Pydeinstaller may help with unpacking Pyinstaller bundlers.

Handling pathologically long lists of expressions or statements is slow. We don’t handle Cython or MicroPython, which don’t use bytecode.

There are numerous bugs in decompilation. And that’s true for every other CPython decompilers I have encountered, even the ones that claimed to be “perfect” on some particular version like 2.4.

As Python progresses, decompilation also gets harder because the compilation is more sophisticated and the language itself is more sophisticated. I suspect that attempts there will be fewer ad-hoc attempts like unpyc37 (which is based on a 3.3 decompiler) simply because it is harder to do so. The good news, at least from my standpoint, is that I think I understand what’s needed to address the problems in a more robust way. But right now, until such time as the project is better funded, I do not intend to make any serious effort to support Python versions 3.8 or 3.9, including bugs that might come in. I imagine at some point I may be interested in it.

You can easily find bugs by running the tests against the standard test suite that Python uses to check itself. At any given time, there are dozens of known problems that are pretty well isolated and that could be solved if one were to put in the time to do so. The problem is that there aren’t that many people who have been working on bug fixing.

Some of the bugs in 3.7 and 3.8 are simply a matter of back-porting the fixes in decompyle3. Any volunteers?

You may run across a bug that you want to report. Please do so after reading How to report a bug and follow the instructions when opening an issue.

Be aware that it might not get my attention for a while. If you sponsor or support the project in some way, I’ll prioritize your issues above the queue of other things I might be doing instead. In rare situations, I can do a hand decompilation of bytecode for a fee. However, this is expensive, usually beyond what most people are willing to spend.

See Also

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uncompyle6-3.9.3.tar.gz (2.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

uncompyle6-3.9.3-py39-none-any.whl (359.1 kB view details)

Uploaded Python 3.9

uncompyle6-3.9.3-py38-none-any.whl (359.1 kB view details)

Uploaded Python 3.8

uncompyle6-3.9.3-py36-none-any.whl (359.1 kB view details)

Uploaded Python 3.6

uncompyle6-3.9.3-py3-none-any.whl (358.9 kB view details)

Uploaded Python 3

File details

Details for the file uncompyle6-3.9.3.tar.gz.

File metadata

  • Download URL: uncompyle6-3.9.3.tar.gz
  • Upload date:
  • Size: 2.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0b3+

File hashes

Hashes for uncompyle6-3.9.3.tar.gz
Algorithm Hash digest
SHA256 78b764d4c843b0455fb39db6deb421a48d5d3ebb846537ba6444afe107c4ebc1
MD5 8716970f82cd270bda3e9087005b7f72
BLAKE2b-256 db9bc6ebd89902b60d397b5b992f58013fd0a29eee4ac87e46a7137a9a79b601

See more details on using hashes here.

File details

Details for the file uncompyle6-3.9.3-py39-none-any.whl.

File metadata

  • Download URL: uncompyle6-3.9.3-py39-none-any.whl
  • Upload date:
  • Size: 359.1 kB
  • Tags: Python 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0b3+

File hashes

Hashes for uncompyle6-3.9.3-py39-none-any.whl
Algorithm Hash digest
SHA256 88b14b26e361e572a97b1c5092fe4a67102346927b75f52e8a738ed7ef587dbc
MD5 fc132abb8dc49022870624cf7a43b11d
BLAKE2b-256 431a57b583724fa2b3b7c70c92e8d7629d0cfcd808ea836272b2d0e33200f207

See more details on using hashes here.

File details

Details for the file uncompyle6-3.9.3-py38-none-any.whl.

File metadata

  • Download URL: uncompyle6-3.9.3-py38-none-any.whl
  • Upload date:
  • Size: 359.1 kB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0b3+

File hashes

Hashes for uncompyle6-3.9.3-py38-none-any.whl
Algorithm Hash digest
SHA256 501d873782b168d9f452a1316ad2caa4ee6853daf594518e6f3e0777cafa4ac8
MD5 10586c9809aaaeaab3dd2c19bd5abed8
BLAKE2b-256 61e6e9cfdba0753cdbbb566805279e0c646bcdf9210ef339adb908e16a67afcd

See more details on using hashes here.

File details

Details for the file uncompyle6-3.9.3-py36-none-any.whl.

File metadata

  • Download URL: uncompyle6-3.9.3-py36-none-any.whl
  • Upload date:
  • Size: 359.1 kB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0b3+

File hashes

Hashes for uncompyle6-3.9.3-py36-none-any.whl
Algorithm Hash digest
SHA256 7f3e21e650400ae144d2189a503b8febf9e76bec6b1b9761a78001896dd43672
MD5 45864e7cd73a63ac23209f03649e998c
BLAKE2b-256 48abbaf6ecbfa3ed1551852e813999220c957b0d585867df466aa631c4829908

See more details on using hashes here.

File details

Details for the file uncompyle6-3.9.3-py3-none-any.whl.

File metadata

  • Download URL: uncompyle6-3.9.3-py3-none-any.whl
  • Upload date:
  • Size: 358.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0b3+

File hashes

Hashes for uncompyle6-3.9.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c2b059147afb97498b6cc42eb14bd09dea23f513aee957f461d0ab84a02d3c36
MD5 a144fe1d0a48eb7d0d9fbe9a10d8d7a0
BLAKE2b-256 62a134986d9989501caf794e05f3d893314e209de9fcf7e49f93ae04c1433a74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page