Home > climate models, uvic_escm > OpenMPI on Raspberry Pi

OpenMPI on Raspberry Pi

2012 February 26

RaspberryPi is an effort to bring a bare-bones computer to the work benches of schools and hobbiest for as little as $25 or, with ethernet and more memory, $35. It is essentially a micro-motherboard with an 700 MHZ ARM1176JZFS CPU, 128-256Mb memory, 10/100 Mbs ethernet. It includes several different connectors to support devices such as usb keyboards, video, and audio. You can learn more at http://www.raspberrypi.org/faqs. They are very close to going to market.

I’m not sure how I stumbled over the RaspberryPi project, but like any good Linux geek, my first thought when seeing the microboard was: I want a Beowulf cluster of those things.

MPI and PVM

There are two primary methods of building parallel computing systems with distributed (not shared) memory: MPI (Message Passing Interface) and PVM (Parallel Virtual Machine). Both are built around a message passing model. There is a top ten list compiled by an MPI group for preferring MPI over PVM. MPI wins the Google Fight by 81000 to 30000. PVM appears to be a better choice in heterogeneous networks; MPI for homogeneous networks.

Building OpenMPI on Raspberry Pi

The Raspberry Pi is running on an ARM1176JZFS cpu which gave me some concern. ARM support had been dropping from a lot of software projects due to its declining share of the personal computing market. But with the resurgence of ARM processors in tablets and smartphones, support seems to be returning as seen in new ARM Linux distros and recent ARM support in OpenMPI.

ARM naming convention seems a bit odd. The Raspberry Pi ARM1176JZFS is considered ARMv6 technology (circa 1992). In the Debian6 qemu emulator I am borrowing, the processor is listed as ‘armv6l.’ Out of the box, OpenMPI supports ARMv7. Nevertheless, it is possible to build OpenMPI on the armv6l emulator.

My build notes follow and can be downloaded here:
http://rhinohide.org/co2/models/tools/openmpi/doc/raspberrypi-openmpi.txt

# Download and Install Virtual Box

https://www.virtualbox.org/

# Download this Ubuntu 10.04 RaspberryPi VM image
# (this is NOT the RISC emulator)
# wget RaspberryPi.VirtualBox.zip
wget http://rpi.descartes.co.uk/sim-emu/RaspberryPi.VirtualBox.zip

# Load the image into Virtual Box 
# through the "Import Appliance" menu

# Start the RaspberryPi Ubuntu VM
# Login with: rpi/password

# Open the LXTerminal

# Install ssh
ub-shell> sudo apt-get install ssh

# Download the debian6.tar.gz QEMU ARMv6 image
ub-shell> wget http://rpi.descartes.co.uk/sim-emu/debian6.tar.gz

# Unpack the image
ub-shell> tar xvzf debian6.tar.gz

# Edit the startup script
# by inserting at end of command
# the following option:  -redir tcp:2222::22
ub-shell> cd debian6
ub-shell> vi launchDebian

# After editing, start the qemu emulator
ub-shell> ./launchDebian

# You can watch the startup by launching xtightvncviewer
# The server connection should be to 'localhost' or '127.0.0.1'
# You can log into this connection with 
# the username and password pi/suse,
# but I prefer a ssh terminal login.

# Networking seems to be off when running the qemu in a VirtualBox VM
# on top of the real host. I spent a short while trying to untangle 
# that nested mess before deciding to move on. TBD. Instead, download 
# the needed files on the Ubuntu VM and secure copy them to
# the Debian6 ARM emulator. 
# User and password: pi/suse.

# Also, the keyboard mapping is a bit off.
# I haven't looked into remapping yet. 
# For now, know that the quote _"_ maps to _@_

# wget openmpi-1.5.4.tar.bz2
ub-shell> wget http://www.open-mpi.org/software/ompi/v1.5/downloads/openmpi-1.5.4.tar.bz2
ub-shell> scp -P 2222 openmpi-1.5.4.tar.bz2 pi@localhost:

# Open a ssh terminal to the Debian6 emulator, 
# user and password: pi/suse 
ub-shell> ssh -p 2222 pi@localhost

# On the Debian6 ARMv6 emulator,
# Unpack the openmpi tarball
deb-shell> bunzip2 openmpi-1.5.4.tar.bz2
deb-shell> tar xvf openmpi-1.5.4.tar
deb-shell> cd openmpi-1.5.4

# You have to make 3 changes to the standard distribution
# 1) Delete all references to the RISC instruction 'dmb'
# 2) Modify the 'configure' file to include an 'armv6' option
# 3) Compile with CFLAGS=-march=armv6 

## 1) Using 'vi', make the following edits to these three files
deb-shell> vi ./opal/asm/generated/atomic-local.s
#	delete all dmb instructions
deb-shell> vi ./opal/asm/base/ARM.asm
#	delete all dmb instructions
deb-shell> vi ./opal/include/opal/sys/arm/atomic.h
#	change the lines:
		#if OPAL_WANT_SMP_LOCKS
			#define MB()  __asm__ __volatile__ ("dmb" : : : "memory")
			#define RMB() __asm__ __volatile__ ("dmb" : : : "memory")
			#define WMB() __asm__ __volatile__ ("dmb" : : : "memory")
		#else
			#define MB()
			#define RMB()
			#define WMB()
		#endif

#	to read:
			#define MB()
			#define RMB()
			#define WMB()
		
## 2) Using 'vi', 
# add the following to the 'configure' file at line 26946 of 171183
deb-shell> vi configure
# goto line 26946, 
# there should be an 'alpha-' section above 
# and an 'armv7' below
# insert the following
#        armv6*)
#            ompi_cv_asm_arch="ARM"
#            OPAL_ASM_SUPPORT_64BIT=0
#            OMPI_GCC_INLINE_ASSIGN='"mov %0, #0" : "=&r"(ret)'
#            ;;

## 3) compile and install with the following CFLAGS
deb-shell> CFLAGS=-march=armv6
deb-shell> ./configure CFLAGS=-march=armv6
deb-shell> make
deb-shell> sudo make install

While reading on MPI, I found some little test programs (Monte Carlo estimations for pi) for running an MPI enabled program at the Center for High Perfomance Computing (http://chpc.wustl.edu/mpi-c.html). Fair warning, you might want to drop the number of iterations from 1e10 to 1e7! (note: mtobias should not be confused with mtobis). The 3-drops-in-magnitude performance might be worrisome if I wasn’t running in two layers of emulation. Performance estimates to come later.

There is a little compile wrapper (mpicc) in OpenMPI that deals with the include and library paths.
deb-shell> mpicc mcmpi.c -o mcmpi

Parallel Processing on the UVic_ESCM

It turns out the UVic_ESCM already has parallel processing support built into it. The psuedo UVic_ESCM “Makefile”, mk.ver, includes instructions for a parallel run on IBM AIX. This calls a run file named “run_parallel_loadleveler”. Loadleveler is an IBM Tivoli product which depends in part (I believe) on the OpenMPI ORTE component.

In addition, Silva and Schmittner appear to be describing some of their own parallelization efforts in this presentation, as well as their integration of a more complete atmospheric model: A parallel Atmosphere-Ocean Global Circulation Model of intermediate complexity for Earth system climate research. You might remember Schmittner from a paper last fall constraining climate sensitivity.

What’s Next?

If the Debian 6 QEMU armv6l is a good emulation of the Raspberry Pi ARM1176JZFS, then we have demonstrated that we could run OpenMPI on the Raspberry Pi. But that doesn’t mean we should. Raspberry Pi nodes are cheap ($35 for the board, $1 for bulk 512MB SD card, $1 for bulk RJ45 and a bit of CAT5 cable). You can get a lot of them for very little money. But increasing the number of nodes only helps if the program you are running has a high degree of parallelization. How parallelized is UVic_ESCM? I don’t know. If the program speed can only be increased by factor of 4 through parallelization, then it is unlikely that a cluster of these ARM chips is going to perform better than a good, modern PC or laptop. On the other hand, if there is a high degree of parallelization in the program, then maybe a cluster make sense. Fortunately, the chips are cheap enough to experiment without a high initial investment. And maybe we can find some numbers that will allow us to estimate performance.

Acknowledgments

The Ubuntu Raspberry Pi VM image and Debian 6 QEMU ARMv6 emulator were assembled and made available by user ‘nmcc’ on the Raspberry Pi forums.
http://www.raspberrypi.org/forum/general-discussion/simulator-emulator-downloads

The armv7 -v- armv6 issues were noted by Robie Basak on the Debian Bugs list who also suggested the -march=armv6 CFLAG option.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=657625

About these ads
  1. Ken
    2012 March 5 at 1:49 pm

    I really wanted to follow along with this, but I found that several of the lines are cut off by margins, and I can’t tell what they are. Too bad, would have been a good tutorial. I can’t even log in to the debian because the password is cut off. I tried to guess, but didn’t get it right.

    Can you fix the document please.

  2. 2012 March 5 at 6:20 pm

    Thank you for your suggestions. I have reformatted the above text to make it more readable, but there are some remaining issues with the width.

    As an alternative, download the build notes here:
    http://rhinohide.org/co2/models/tools/openmpi/doc/raspberrypi-openmpi.txt

  3. Gord
    2012 March 9 at 2:06 am

    Good work! For planning purposes, could you give us an indication of compile time and what host hardware you used for this? Hou4rs? or Minutes? :)
    I’m short of hardware right now and I don’t want to start something huge and get frustrated by a low-end dev box that will make me pull my hair out.

    With the long lead time of the Raspberry Pi, especially for quantity orders, I may prefer to wait a month or three if it needs some muscle to tinker with MPI in VMs/emulators properly.

    (the text-based build notes are easier to follow folks)

  4. 2012 March 9 at 7:40 am

    The host box is a Dell 1747 i3 Q720 with 4MB 4GB RAM and 1.6 GHz clock.
    I am running the Ubuntu VM in VirtualBox.
    The Debian QEMU is running on the Ubuntu VM.

    It tooks something like 5 and a half hours for OpenMPI to compile in QEMU.

  5. Gord
    2012 March 9 at 8:52 am

    Thanks for that, not too demanding then, so worth the keyboard time for an overnite compile.
    Your unusual VM-in-VM technique is one I’ve used a few times myself to cure Host/Guest networking strangeness. Many thanks for the details build info.

    You must be roughly my age: 4MB for a desktop or dev box seems perfectly reasonable to me too. I make that subtle mistake almost every day :)

  6. 2012 March 9 at 9:23 am

    Eh! :D

    True ’nuff about the age, but the confusion is more likely due to the amount of time I’ve spent reading about L1/L2/L3 cache these days.

  7. mattreid9956
    2012 November 4 at 4:32 pm

    This is really cool! I must get a Pi soon and test this out!

  1. 2012 March 4 at 11:48 am
  2. 2012 December 7 at 5:39 am
Comments are closed.
Follow

Get every new post delivered to your Inbox.

Join 27 other followers