Running ADMM LASSO example on Mac OS X Mountain Lion
July 5, 2013 at 2:13 pm | Posted in Big Data | Leave a commentTags: ADMM, Classification, LASSO, Lion, Mac OS X, machine learning, Mountain Lion, Regression
Being able to process massive datasets for machine learning is becoming increasingly important. By massive datasets I mean data that won’t fit into RAM on a single machine (even with sparse representations or using the hashing trick). There have been a number of initiatives in the academic and research arena that attempt to address the problem; one very interesting one is Alternating Direction Method of Multipliers (ADMM). It’s an old idea that has been resurrected in this paper by Stephen Boyd’s team at Stanford. A quick google on ‘Alternating Direction Method of Multipliers’ shows a recent surge of academic papers as people have started to take the ideas on-board.
That paper comes with some example code including a complete small-scale example of distributed L1 regularized least squares using MPI. The code was tested on Mac OS X 10.6, Debian 6, and Ubuntu 10.04. It requires installation of an MPI implementation but the authors state that OpenMPI is installed with Mac OS X 10.5 and later. So it sounds like it would be easy to run on my new iMac. Well it turns out that from Mac OS X 10.7 (Lion) this is no longer true (see here). So here are the augmented instructions for Mac OS X 10.8 that worked for me; they come with the usual ‘your mileage may vary’ caveat.
Before You Start
I assume that XCode is already installed (freely available from the App Store, i’m using 4.6.3) and that command line tools are installed (Xcode | Preferences | install Command Line Tools). Typing gcc in the terminal gives me
i686-apple-darwin11-llvm-gcc-4.2.
You should, of course, always download from a reputable site and verify the checksum (e.g. using md5 or gpg). Safari seems to be set up to automatically uncompress .tar.gz files to .tar. Very helpful Safari but now I can’t checksum the downloaded file! To prevent this behaviour go to Safari | Preferences | General tab and untick ‘Open “safe” files after downloading’. Yes I found that ironic too.
Install GNU Scientific Library
First you need to download and install GNU Scientific Library. I used the mirror suggested by the GSL site. Download the latest release which in my case was 1.15 (gsl-1.15.tar.gz). Now do the following
tar zxf gsl-1.15.tar.gz mv gsl-1.15 ~ cd ~/gsl-1.15 export CC=CLANG ./configure make make check > log 2>&1
The ‘make check’ call runs some tests on the installation. Originally I didn’t have the export CC=CLANG line and this failed some of the tests so it seems worthwhile to do the checks.
So review the file called log and if everything looked like it passed and no failures, proceed as follows:
sudo make install
This will place GSL in /usr/local and requires admin privileges. You should be able to use make –prefix to put it elsewhere but I didn’t try that.
Install OpenMPI
Go to http://www.open-mpi.org and download the latest stable release of Open MPI – at the time of writing that was 1.6.5. Then the following sequence will install (again i’m installing to /usr/local):
tar zxf openmpi-1.6.5.tar.gz mv open-1.6.5 ~ cd ~/open-1.6.5 ./configure --prefix /usr/local make sudo make install
Download and Run Distributed LASSO
The link to the ADMM source code is on the page ‘MPI example for alternating direction method of multipliers‘ along with instructions for installing:
- Download and expand the mpi_lasso tar ball. The package contains a Makefile, the solver, and a standard library for reading in matrix data.
- Edit the Makefile to ensure that the GSLROOT variable is set to point to the location where you installed GSL, and that the ARCH variable is set appropriately (most likely to i386 or x86_64). On some machines, it may be necessary to remove the use of the flag entirely.
- Run make. This produces a binary called lasso.
Incidentally the Makefile seems to contain additional instructions to build a component called ‘gam’. gam.c is not included in the download so I just removed all references to gam. Here is what my Makefile looks like:
GSLROOT=/usr/local # use this if on 64-bit machine with 64-bit GSL libraries ARCH=x86_64 # use this if on 32-bit machine with 32-bit GSL libraries # ARCH=i386 MPICC=mpicc CC=gcc CFLAGS=-Wall -std=c99 -arch $(ARCH) -I$(GSLROOT)/include LDFLAGS=-L$(GSLROOT)/lib -lgsl -lgslcblas -lm all: lasso lasso: lasso.o mmio.o $(MPICC) $(CFLAGS) $(LDFLAGS) lasso.o mmio.o -o lasso lasso.o: lasso.c mmio.o $(MPICC) $(CFLAGS) -c lasso.c mmio.o: mmio.c $(CC) $(CFLAGS) -c mmio.c clean: rm -vf *.o lasso
A typical execution using the provided data set and using 4 processes on the same machine is
mpirun -np 4 lasso
The output should look like this:
[0] reading data/A1.dat [1] reading data/A2.dat [2] reading data/A3.dat [3] reading data/A4.dat [3] reading data/b4.dat [1] reading data/b2.dat [0] reading data/b1.dat [2] reading data/b3.dat using lambda: 0.5000 # r norm eps_pri s norm eps_dual objective 0 0.0000 0.0430 0.1692 0.0045 12.0262 1 3.8267 0.0340 0.9591 0.0427 11.8101 2 2.6698 0.0349 1.5638 0.0687 12.1617 3 1.5666 0.0476 1.6647 0.0831 13.2944 4 0.8126 0.0614 1.4461 0.0886 14.8081 5 0.6825 0.0721 1.1210 0.0886 16.1636 6 0.7332 0.0793 0.8389 0.0862 17.0764 7 0.6889 0.0838 0.6616 0.0831 17.5325 8 0.5750 0.0867 0.5551 0.0802 17.6658 9 0.4539 0.0885 0.4675 0.0778 17.6560 10 0.3842 0.0897 0.3936 0.0759 17.5914 11 0.3121 0.0905 0.3389 0.0744 17.5154 12 0.2606 0.0912 0.2913 0.0733 17.4330 13 0.2245 0.0917 0.2558 0.0725 17.3519 14 0.1847 0.0923 0.2276 0.0720 17.2874 15 0.1622 0.0928 0.2076 0.0716 17.2312 16 0.1335 0.0934 0.1858 0.0713 17.1980 17 0.1214 0.0939 0.1689 0.0712 17.1803 18 0.1045 0.0944 0.1548 0.0710 17.1723 19 0.0931 0.0950 0.1344 0.0708 17.1768 20 0.0919 0.0954 0.1243 0.0707 17.1824 21 0.0723 0.0958 0.1152 0.0705 17.1867 22 0.0638 0.0962 0.1079 0.0704 17.1896 23 0.0570 0.0965 0.1019 0.0702 17.1900 24 0.0507 0.0968 0.0964 0.0701 17.1898 25 0.0460 0.0971 0.0917 0.0700 17.1885 26 0.0416 0.0973 0.0874 0.0699 17.1866 27 0.0382 0.0976 0.0834 0.0698 17.1846 28 0.0354 0.0978 0.0798 0.0697 17.1827 29 0.0329 0.0980 0.0762 0.0697 17.1815 30 0.0311 0.0983 0.0701 0.0696 17.1858 31 0.0355 0.0985 0.0667 0.0696 17.1890
If you open up the file data/solution.dat it will contain the optimal z (which equals x) parameters, most of which should be zero.
Create a free website or blog at WordPress.com.
Entries and comments feeds.