I love Python, but I have always been a little frustrated by how it handles installing and locating modules. The situation is further complicated by Python Eggs, which I find to be an off-putting name for a endlessly frustrating feature. In particular, I have never really figured out where native support for eggs (just .zip files really) in Python ends and special functionality provided by setuptools begins. In this post I am going to try to figure most of this out in what I hope to be a sane order.
As a note, I know a lot of this is made much simpler by just using virtualenv, but these issues still come up.
First, a really great introduction to Python modules/packages in general:
http://mxm-mad-science.blogspot.com/2008/02/python-eggs-simple-introduction.html
He deftly handles a point of confusion I always seem to run into
A module and a package are not the same thing.
If you are importing a package which would be represented by a directory (on the Python Path) containing a init.py file, you cannot assume that
import x
x.y
will work for every y that this works for
from x import y
This is because when you import x, you are importing the package x, defined my its init.py file. Unless y is explicitly imported in that file like this:
from . import y
It will not just work.
At least I am not the only one who was confused by this.
OK, that is out of the way. Which site-packages will my Python use?
It turns out this is a defined at install time, and the defaults vary by system. In particular, look for the details of prefix and exec_prefix at this link http://docs.python.org/install/index.html#how-installation-works.
I am still trying to figure out the details, but it also seems that the Python site module is loaded on interpreter startup and can add some system specific locations to the python path. Check this link out for details http://docs.python.org/library/site.html.
It's also worth pointing out that Python modules that contain c-extensions are by definition dependent on your processor architecture. On some systems/installers they will therefore be placed in an architecture specific location (probably /usr/lib64 instes of /usr/lib).
Holy Moly this is confusing! I rant into a situation today where I installed module with a c-extension, and because a c-extension made the Python module architecture dependent it was installed (by yum on Centos) in the /usr/lib64/python2.7/site-packages directory rather than /usr/lib. However, Python, by default wasn't looking there. Thanks!
OK, I found my site-packages directory, where else might it look for modules?
.pth files
If you have a .pth file in your site-packages directory, the packages and modules (and eggs, yuck!) listed in there will be added to the path. Setup tools puts an easy-install.pth file in there usually, but you can add your own.
PYTHONPATH environment variable
You can also add locations to the path by adding them to the PYTHONPATH environment variable (just like CLASSPATH in Java).
http://www.stereoplex.com/blog/understanding-imports-and-pythonpath
eggs
This is where I am not even all that sure. I think that natively (without setuptools) Python can support looking in Eggs (which are just zip files) for code. So if you put in egg in site-packages or another directory on the path, or list the egg itself on the path it will work.
From the source, http://peak.telecommunity.com/DevCenter/PythonEggs#Using-eggs
If you have a pure-Python .egg file that doesn't use any in-package data files, and you don't mind manually placing it on sys.path or PYTHONPATH, you can use the egg without installing setuptools. For eggs containing C extensions, however, or those that need access to non-Python data files contained in the egg, you'll need the pkg_resources module from setuptools installed.
So I guess that answers that. I am sure you have run into situations where pkg_resources was missing. It's likely that was caused my a project needing some of the more specialty features of an egg. I've heard, for example , that you some applications will be installable via their setup.py file without setuptools installed; they just won't run without it.
Wherein I ramble on about how "import" might work with eggs without taking the time actually find out the answer
The same site points that you can import an egg (and I assume they mean an egg with with data files or c extensions) with this
from pkg_resources import require
require("FooBar>=1.2")
However this site says:
You may use an egg simply by pointing PYTHONPATH or sys.path at it and importing as you normally would, thanks to the import hook changes in recent versions of Python (you need 2.3.5+ or 2.4). If you wish to take this approach, you do not need to bother with setuptool sor ez_setup.py at all.
So I am a bit confused as to exactly what happened in 2.3.5 that enables full support for eggs within Python natively. If they are just hooks, do you need to install a library to take advantage of those hooks? Confusing. I will try to figure this out.
.egg-info directories
Ha, I have no idea what this is. Someone here does though. It's still overly complicated though.
Oh, wait! Can we add just one more edge-case here for good measure? (the Python egg cache)
Some eggs that are zipped (but not all mind you), for reasons that I have never fully understood (there is an explanation in the note of this stackoverflow answer), must actually be unzipped and placed somewhere at runtime. By default this is done in /$HOME/.python-eggs. You can override this with the PYTHON_EGG_CACHE environment variable. This is the source of endless problems during deployment when you think everything is working, but then you deploy, and your webserver doesn't have permission to access wherever the egg cache is located.
distutils, setuptools, ez_setup.py, easy_install.py, distribute, and distutils2
This is slightly off-topic, but relevant in that it relates to how you might install python modules. The original python module for packaging was called distutils. It is very limited, and PEAK came out with setuptools to enhance its functionality. This introduced the whole egg thing. It included a boostrap utility called ez_setup.py that will install setuptools and and utility called easy_install.py. easy_install.py could then be used from the command line to install python modules from PyPi (the cheeseshop).
Setuptools is nice, but the documentation on the PEAK website is infamously terrible, and the project has stagnated. Distribute is a fork of setuptools that add some functionality. To use it Distribute within a project, there is a file called distribute_setup.py that you can package with you code and call from your setup.py that will install it.
Then comes distutils2. I did some reading on the web, and the discussion seems to suggest that distutils2 is a re-write of distutils, but done by team that was working on Distribute. Allegedly resources moved from working on Distribute to distutils2, and distutils2 is the future.
Interestingly enough, as of the time I wrote this, there seem to be two different repos for distutils2- on bitbucket and python.org. The latest commit on either of those repos was 2 months ago. Distribute, on the other hand, the supposedly less active project, has one repo on bitbucket and the last commit was 11 days ago. So, in short, I have no idea what is going on.
Thanks for reading, please leave any comments or corrections.