OpenMPI on Raspberry Pi
RaspberryPi is an effort to bring a bare-bones computer to the work benches of schools and hobbiest for as little as $25 or, with ethernet and more memory, $35. It is essentially a micro-motherboard with an 700 MHZ ARM1176JZFS CPU, 128-256Mb memory, 10/100 Mbs ethernet. It includes several different connectors to support devices such as usb keyboards, video, and audio. You can learn more at http://www.raspberrypi.org/faqs. They are very close to going to market.
I’m not sure how I stumbled over the RaspberryPi project, but like any good Linux geek, my first thought when seeing the microboard was: I want a Beowulf cluster of those things.
MPI and PVM
There are two primary methods of building parallel computing systems with distributed (not shared) memory: MPI (Message Passing Interface) and PVM (Parallel Virtual Machine). Both are built around a message passing model. There is a top ten list compiled by an MPI group for preferring MPI over PVM. MPI wins the Google Fight by 81000 to 30000. PVM appears to be a better choice in heterogeneous networks; MPI for homogeneous networks.
Building OpenMPI on Raspberry Pi
The Raspberry Pi is running on an ARM1176JZFS cpu which gave me some concern. ARM support had been dropping from a lot of software projects due to its declining share of the personal computing market. But with the resurgence of ARM processors in tablets and smartphones, support seems to be returning as seen in new ARM Linux distros and recent ARM support in OpenMPI.
ARM naming convention seems a bit odd. The Raspberry Pi ARM1176JZFS is considered ARMv6 technology (circa 1992). In the Debian6 qemu emulator I am borrowing, the processor is listed as ‘armv6l.’ Out of the box, OpenMPI supports ARMv7. Nevertheless, it is possible to build OpenMPI on the armv6l emulator.
My build notes follow and can be downloaded here:
# Download and Install Virtual Box https://www.virtualbox.org/ # Download this Ubuntu 10.04 RaspberryPi VM image # (this is NOT the RISC emulator) # wget RaspberryPi.VirtualBox.zip wget http://rpi.descartes.co.uk/sim-emu/RaspberryPi.VirtualBox.zip # Load the image into Virtual Box # through the "Import Appliance" menu # Start the RaspberryPi Ubuntu VM # Login with: rpi/password # Open the LXTerminal # Install ssh ub-shell> sudo apt-get install ssh # Download the debian6.tar.gz QEMU ARMv6 image ub-shell> wget http://rpi.descartes.co.uk/sim-emu/debian6.tar.gz # Unpack the image ub-shell> tar xvzf debian6.tar.gz # Edit the startup script # by inserting at end of command # the following option: -redir tcp:2222::22 ub-shell> cd debian6 ub-shell> vi launchDebian # After editing, start the qemu emulator ub-shell> ./launchDebian # You can watch the startup by launching xtightvncviewer # The server connection should be to 'localhost' or '127.0.0.1' # You can log into this connection with # the username and password pi/suse, # but I prefer a ssh terminal login. # Networking seems to be off when running the qemu in a VirtualBox VM # on top of the real host. I spent a short while trying to untangle # that nested mess before deciding to move on. TBD. Instead, download # the needed files on the Ubuntu VM and secure copy them to # the Debian6 ARM emulator. # User and password: pi/suse. # Also, the keyboard mapping is a bit off. # I haven't looked into remapping yet. # For now, know that the quote _"_ maps to _@_ # wget openmpi-1.5.4.tar.bz2 ub-shell> wget http://www.open-mpi.org/software/ompi/v1.5/downloads/openmpi-1.5.4.tar.bz2 ub-shell> scp -P 2222 openmpi-1.5.4.tar.bz2 pi@localhost: # Open a ssh terminal to the Debian6 emulator, # user and password: pi/suse ub-shell> ssh -p 2222 pi@localhost # On the Debian6 ARMv6 emulator, # Unpack the openmpi tarball deb-shell> bunzip2 openmpi-1.5.4.tar.bz2 deb-shell> tar xvf openmpi-1.5.4.tar deb-shell> cd openmpi-1.5.4 # You have to make 3 changes to the standard distribution # 1) Delete all references to the RISC instruction 'dmb' # 2) Modify the 'configure' file to include an 'armv6' option # 3) Compile with CFLAGS=-march=armv6 ## 1) Using 'vi', make the following edits to these three files deb-shell> vi ./opal/asm/generated/atomic-local.s # delete all dmb instructions deb-shell> vi ./opal/asm/base/ARM.asm # delete all dmb instructions deb-shell> vi ./opal/include/opal/sys/arm/atomic.h # change the lines: #if OPAL_WANT_SMP_LOCKS #define MB() __asm__ __volatile__ ("dmb" : : : "memory") #define RMB() __asm__ __volatile__ ("dmb" : : : "memory") #define WMB() __asm__ __volatile__ ("dmb" : : : "memory") #else #define MB() #define RMB() #define WMB() #endif # to read: #define MB() #define RMB() #define WMB() ## 2) Using 'vi', # add the following to the 'configure' file at line 26946 of 171183 deb-shell> vi configure # goto line 26946, # there should be an 'alpha-' section above # and an 'armv7' below # insert the following # armv6*) # ompi_cv_asm_arch="ARM" # OPAL_ASM_SUPPORT_64BIT=0 # OMPI_GCC_INLINE_ASSIGN='"mov %0, #0" : "=&r"(ret)' # ;; ## 3) compile and install with the following CFLAGS deb-shell> CFLAGS=-march=armv6 deb-shell> ./configure CFLAGS=-march=armv6 deb-shell> make deb-shell> sudo make install
While reading on MPI, I found some little test programs (Monte Carlo estimations for pi) for running an MPI enabled program at the Center for High Perfomance Computing (http://chpc.wustl.edu/mpi-c.html). Fair warning, you might want to drop the number of iterations from 1e10 to 1e7! (note: mtobias should not be confused with mtobis). The 3-drops-in-magnitude performance might be worrisome if I wasn’t running in two layers of emulation. Performance estimates to come later.
There is a little compile wrapper (mpicc) in OpenMPI that deals with the include and library paths.
deb-shell> mpicc mcmpi.c -o mcmpi
Parallel Processing on the UVic_ESCM
It turns out the UVic_ESCM already has parallel processing support built into it. The psuedo UVic_ESCM “Makefile”, mk.ver, includes instructions for a parallel run on IBM AIX. This calls a run file named “run_parallel_loadleveler”. Loadleveler is an IBM Tivoli product which depends in part (I believe) on the OpenMPI ORTE component.
In addition, Silva and Schmittner appear to be describing some of their own parallelization efforts in this presentation, as well as their integration of a more complete atmospheric model: A parallel Atmosphere-Ocean Global Circulation Model of intermediate complexity for Earth system climate research. You might remember Schmittner from a paper last fall constraining climate sensitivity.
If the Debian 6 QEMU armv6l is a good emulation of the Raspberry Pi ARM1176JZFS, then we have demonstrated that we could run OpenMPI on the Raspberry Pi. But that doesn’t mean we should. Raspberry Pi nodes are cheap ($35 for the board, $1 for bulk 512MB SD card, $1 for bulk RJ45 and a bit of CAT5 cable). You can get a lot of them for very little money. But increasing the number of nodes only helps if the program you are running has a high degree of parallelization. How parallelized is UVic_ESCM? I don’t know. If the program speed can only be increased by factor of 4 through parallelization, then it is unlikely that a cluster of these ARM chips is going to perform better than a good, modern PC or laptop. On the other hand, if there is a high degree of parallelization in the program, then maybe a cluster make sense. Fortunately, the chips are cheap enough to experiment without a high initial investment. And maybe we can find some numbers that will allow us to estimate performance.
The Ubuntu Raspberry Pi VM image and Debian 6 QEMU ARMv6 emulator were assembled and made available by user ‘nmcc’ on the Raspberry Pi forums.
The armv7 -v- armv6 issues were noted by Robie Basak on the Debian Bugs list who also suggested the -march=armv6 CFLAG option.