Status Report

November 18, 2009
by Evan Silberman (ejs07)

While John has been working on Erlush (more of a rabbit hole than he expected, I’m sure), I’ve been bouncing around between a few different ideas for interacting with Erlush from other language environments and for launching Erlush programs across cluster nodes. Coming up to the end of the semester, here’s a look at the current state of affairs in both of these areas:

  • Foreign language interfaces: I futzed around for a while with the (probably doomed-from-the-get-go) idea of using standard input/output and the cluster’s filesystem as transports, but then John told me that he had figured out how to use Erlang’s message-passing/marshaling system. He wrote a Python library for writing fitness evaluators for Erlush, and I employed Erlectricity to do something similar in Ruby. These things both work, on computers where everything is properly installed. (At the time of this writing this does not include the cluster, but we’ll get there.)
  • Launching jobs on the cluster: I’ve pretty much finished a fairly straightforward Ruby program to generate Tractor scriptsthat can be spooled to the Tractor job queue. It needs to be more configurable, and I need to work with John to figure out how to collect results from the worker nodes when they finish running.

While it may seem like we’ve been treading water a bit here, I think we’re making real progress, and as our understanding of the best way to put together this infrastructure has evolved, we’ve come a lot closer to our ultimate goal of making running GP experiments on the cluster a one-command operation. Or really, an edit-a-configuration-file-then-run-a-command operation. And we’re really and truly nearly there!


Erlush evolves its first non-trivial program!

November 4, 2009
by John Schanck (jms07)

It computes factorial. Here it is:

[{exec,pop}, [[[{exec,do_star_times},{integer,yank}],[[-9,{integer,yankdup}]]], [{boolean,stackdepth}, [[{integer,flush},{exec,b},{exec,k}], {exec,y}, [[[[{boolean,rot},{exec,do_star_times}]]], {integer,divide}, {integer,yank}]], [[[[{exec,do_star_count},{integer,lessthan}]]]]]], {integer,equal}, [{exec,do_star_times}, [[{exec,s},{exec,do_star_times}],[{integer,max},{boolean,yankdup}],{exec,c}], {exec,k}, {integer,multiply}]]

And the same program after simplification (by hand):

[{exec,do_star_times}, [[{exec,do_star_times}],{exec,c},[[],{exec,c}]], {exec,k}, {integer,multiply}]

And again in nicer, lispy, Push3 syntax

(EXEC.DO*TIMES ((EXEC.DO*TIMES) EXEC.C (() EXEC.C)) EXEC.K INTEGER.*)

I was pretty excited to see C combinators in there, perhaps they’re a worthwhile addition to the language.. No worries if you can’t figure out what’s going on here, I’ve single stepped through it with Erlush in debugger mode and still can’t say for sure that I know how it works. I did, however, discover that it’s exploiting a bug in my implementation of EXEC.DO*TIMES — which is discouraging and exciting at the same time.

More to come.

John

UPDATE: I fixed the EXEC.DO*TIMES bug and reran the factorial evolver, and got (after simplification and Push3ifying)

(EXEC.STACKDEPTH INTEGER.DUP INTEGER./ EXEC.DO*RANGE INTEGER.*)

Which is much clearer. The first three instructions just put a 1 on the stack on top of input and then EXEC.DO*RANGE INTEGER.* iterates from the input value down to 1 multiplying the loop counter and the value beneath it each time.


headless: A tool for running programs… headlessly

October 17, 2009
by John Schanck (jms07)

I talked to Josiah earlier today and got him to install Xvfb (X Virtual FrameBuffer) on the cluster; so now you can run Processing and other graphical applications on there. Xvfb emulates a graphical display, so as far as your programs are concerned they’re running in a full fledged graphical environment. All you have to do is launch your application with my “headless” script. Headless simply creates a virtual display with Xvfb, and tells your program to use it instead of the default display.

To use headless:
Download the script and copy it to your home directory on the cluster:

scp -p -r headless YourUserName@fly.hampshire.edu:~/

Now say your graphical program is called doawesome and takes three arguments. Instead of telling tractor or multiquery to issue the command:
./doawesome 1 33 7
You tell it to use the headless script instead:
./headless ./doawesome 1 33 7

One more tip for processing users:
When you’re ready to run your sketch on the cluster, open it in processing and click File->Export application, then select Linux and hit export. This should create a directory called application.linux inside your sketch’s directory, and within application.linux a shell script that will launch your sketch. To run it on the cluster just copy this application.linux directory to your home directory:

scp -p -r YourSketch/application.linux YourUserName@fly.hampshire.edu:~/

And issue jobs for the headless script to launch the shell script inside of ~/application.linux

multiquery './headless ~/application.linux/doawesome'


Looking for job queue alternatives

October 12, 2009
by Evan Silberman (ejs07)

I spent some time today looking into alternative job queuing solutions for running stuff on the cluster. After some unnecessarily-difficult detective work, I figured out that the system apparently used most often with Rocks is Maui, which is released as open source by a commercial enterprise selling some hella complicated stuff. (All this rocks stuff is hilariously difficult to interpret, by the way. There’s really no documentation that tells you what anything is for, or how to do anything other than install things.) It would be rad and everything to try to implement a somewhat more open/accessible system for cluster job queuing than tractor, but this is not an area where anything is straightforward and easy. Given Hampshire’s resources, its existing support relationship with Pixar, and the institutional familiarity with Alfred and Tractor, writing a system that targets Tractor is the only thing that makes sense to me right now as a first step.


Tractor: a job queue system from Pixar

October 7, 2009
by Evan Silberman (ejs07)

Shauna and I met with Josiah Erickson on Monday, and we learned the very useful fact that the cluster does indeed have a load-balancing job queue system operating on it called Tractor. It’s one of the tools distributed with RenderMan, but the scripts one writes to spool tasks are quite generic and should be useful for a variety of applications. When I actually start writing the GP experiment manager (very soon now), it will queue jobs by producing tractor scripts as output and spooling them. There are a couple of quirks with Tractor as it is presently configured that tend to result in permissions errors, but hopefully by working with Josiah and being clever with scripting it will be possible for users to employ the system without having to do stuff with permissions editing. Or at least, not too much of it.


Launching things on other nodes

September 29, 2009
by Evan Silberman (ejs07)

is not all that hard, actually, so huzzah for that. I modified John’s script slightly to launch dummy worker processes (they just sleep for ten seconds) on the compute nodes, that then write out a file to my homedir on the NFS share. Looks like running one command via ssh on each desired node will more or less accomplish some of the sorts of things we intend to accomplish. Not that this is shocking, I just wanted to convince myself that I could do it.


gquery: A tool for getting sorted lists of cluster nodes

September 28, 2009
by John Schanck (jms07)

I wrote a little script that uses ganglia (The cluster monitoring toolkit which was already installed on the cluster) to generate lists of nodes sorted in order of lowest cpu usage. It can also sort by memory free and several other metrics.

Example usage (Get 10 nodes, sorted by lowest cpu usage):

[jms07@fly ~]$ ./gquery -c 10
compute-0-2
compute-0-3
compute-0-4
compute-0-5
compute-0-6
compute-0-7
compute-0-8
compute-0-9
compute-1-5
compute-1-10

Changelog
Tarball
Git: git://anomos.info/~john/ClusterSupport

-John


Push in Erlang

September 23, 2009
by John Schanck (jms07)

I’ve made the source code for my Erlang implementation of Push available at anomos.info/~john/. I’m currently calling it PushAgner, this is a terrible name (Erlang is named after the inventor of queuing theory, Agner Krarup Erlang), please help me come up with a better one.
EDIT: I’ve renamed it to Erlush, which was another working title I had used, and was suggested by Lee.

If you have git installed, you can get the source by running:
$ git clone git://anomos.info/~john/erlush
Following which you can get any updates by running:
$ git pull

If you don’t have git, you can just grab a tarball of the latest source.

I haven’t written any documentation yet. If you really want to get this running it’s probably best to contact me directly.

– John


Meeting liveblog

September 21, 2009
by Evan Silberman (ejs07)

7:17PM: We are meeting.
7:18PM: We are talking about what we want to accomplish.
7:19PM: John is testing the use of CoDeploy on the cluster as a potential tool for deploying our code across the cluster.
7:27PM: We are discussing the feasibility of leaving Erlang interpreters idling on cluster nodes.
7:47PM: We are getting caught up doing weird things with ssh-agent.
7:51PM: We have resolved to meet with Josiah on Wednesday.