21

I want to have different behavior in a python script, depending on the type of file. I cannot use the filename extension as it may not be present or misleading. I could call the file utility and parse the output, but I would rather use a python builtin for portability.

So is there anything in python that uses heuristics to deduce the type of the file from its contents?

1

1 Answer 1

17

Probably others as well. "magic" is the magic keyword to search for. ;-)

Sign up to request clarification or add additional context in comments.

6 Comments

libmagic isn't perfect for all files. It looks at the "magic number" in a file header. Text files, such as source code, don't have headers and libmagic has to resort to wild guessing ... it can be very wrong about them.
Such is the danger of all content-sniffing approaches. Often the number of ‘acceptable’ file types is smaller than the list known by libmagic, in which case ad-hoc app-level sniffing can be a better bet, but for the general case there's not much you can do about it.
libmagic is what file uses, so it's very, very hard to find a closer match to file.
Update 2014: Both of these are dead. I think filemagic is the current library for this functionality.
Update 2014: My bad. python-magic is alive and well.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.