Thursday, January 24, 2013

Theano + Pylearn2

Theano is an awesome Python library to efficiently handle multi-dimensional arrays. It is aimed to a machine learning research/academic audience. Pylearn2 is a library designed to make machine learning research easy. The latter is built on top of the former.

Being both in active development, it is expected to findo some bumps on the road. The first few bumps appeared during the installation.



Before I start, I should mention that I'm working on OSX 10.7.5.

First attempt: FAILURE!

It is always a good idea to use virtualenv for any kind of Python project, so I set up a virtual environment as usual:
virtualenv new_env --no-site-packages
cd new_env
source bin/activate
Since Theano is available PyPi, pip is the way to go:
pip install theano
This attempted to install theano and its dependencies. The first dependency (NumPy) was successfully installed. Unfortunately the second dependency (SciPy) failed:
...
ImportError: No module named numpy.distutils.core

Second attempt: FAILURE!

Fortunately, I did have NumPy, SciPy and the other dependencies installed in my system. So I decided to use the system-wide versions.
cd ..
rm -Rf new_env
virtualenv new_env --system-site-packages
cd new_env
source bin/activate
pip install theano
This time Theano was successfully installed. Now it was time to install Pylearn2. Unfortunately Pylearn2 is not available in PyPi, so you need to get the source code and install it using setup.py
git clone git://github.com/lisa-lab/pylearn2.git
cd pylearn2
python setup.py install
The installation succeeded. Now it was time to test it
python
Inside the python intepreter:
import pylearn2
Unfortunately, this threw an error:
...
from theano.printing import hex_digest 
ImportError: cannot import name hex_digest

Third attempt: SUCCESS!

I did some exploration on Theano's source code in GitHub and the hex_digest function was there in the printing.py module. Inside the python interpreter I typed:
import theano
help(theano.printing)
but I couldn't find the hex_digest in there. So I assumed there was some missmatch between the version in PyPi and the one in GitHub (although they both claim to be 0.6.0rc2). So I decided to install Theano directly from GitHub. Just to be on the safe side, I decided to start a new virtualenv from scratch:
cd ..
rm -Rf new_env
virtualenv new_env --system-site-packages
cd new_env
source bin/activate
pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git
pip install git+git://github.com/lisa-lab/pylearn2.git
(btw, the last line is a handy way of installing pylearn2 from GitHub using pip, which is simpler and cleaner than downloading the source and running setup.py). Finally, success!

4 comments:

  1. The reason you ran into trouble with the hex_digest thing was that Pylearn2 often uses a newer version of Theano than the latest stable release. Often adding a new feature to Pylearn2 requires adding a new feature to Theano, so if you want to use Pylearn2 we recommend using the development version of Theano. In the future we'll probably start making bundled stable releases of (Theano, Pylearn2) pairs, but at the moment there has never been an official release of Pylearn2.

    ReplyDelete
  2. Aha! That explains it all. Thanks for the clarification Ian!

    ReplyDelete