Archive for the 'Blag Toasts' Category

Status Report

Wednesday, November 18th, 2009

While John has been working on Erlush (more of a rabbit hole than he expected, I’m sure), I’ve been bouncing around between a few different ideas for interacting with Erlush from other language environments and for launching Erlush programs across cluster nodes. Coming up to the end of the semester, here’s a look at the current state of affairs in both of these areas:

  • Foreign language interfaces: I futzed around for a while with the (probably doomed-from-the-get-go) idea of using standard input/output and the cluster’s filesystem as transports, but then John told me that he had figured out how to use Erlang’s message-passing/marshaling system. He wrote a Python library for writing fitness evaluators for Erlush, and I employed Erlectricity to do something similar in Ruby. These things both work, on computers where everything is properly installed. (At the time of this writing this does not include the cluster, but we’ll get there.)
  • Launching jobs on the cluster: I’ve pretty much finished a fairly straightforward Ruby program to generate Tractor scriptsthat can be spooled to the Tractor job queue. It needs to be more configurable, and I need to work with John to figure out how to collect results from the worker nodes when they finish running.

While it may seem like we’ve been treading water a bit here, I think we’re making real progress, and as our understanding of the best way to put together this infrastructure has evolved, we’ve come a lot closer to our ultimate goal of making running GP experiments on the cluster a one-command operation. Or really, an edit-a-configuration-file-then-run-a-command operation. And we’re really and truly nearly there!

Looking for job queue alternatives

Monday, October 12th, 2009

I spent some time today looking into alternative job queuing solutions for running stuff on the cluster. After some unnecessarily-difficult detective work, I figured out that the system apparently used most often with Rocks is Maui, which is released as open source by a commercial enterprise selling some hella complicated stuff. (All this rocks stuff is hilariously difficult to interpret, by the way. There’s really no documentation that tells you what anything is for, or how to do anything other than install things.) It would be rad and everything to try to implement a somewhat more open/accessible system for cluster job queuing than tractor, but this is not an area where anything is straightforward and easy. Given Hampshire’s resources, its existing support relationship with Pixar, and the institutional familiarity with Alfred and Tractor, writing a system that targets Tractor is the only thing that makes sense to me right now as a first step.

Tractor: a job queue system from Pixar

Wednesday, October 7th, 2009

Shauna and I met with Josiah Erickson on Monday, and we learned the very useful fact that the cluster does indeed have a load-balancing job queue system operating on it called Tractor. It’s one of the tools distributed with RenderMan, but the scripts one writes to spool tasks are quite generic and should be useful for a variety of applications. When I actually start writing the GP experiment manager (very soon now), it will queue jobs by producing tractor scripts as output and spooling them. There are a couple of quirks with Tractor as it is presently configured that tend to result in permissions errors, but hopefully by working with Josiah and being clever with scripting it will be possible for users to employ the system without having to do stuff with permissions editing. Or at least, not too much of it.

Launching things on other nodes

Tuesday, September 29th, 2009

is not all that hard, actually, so huzzah for that. I modified John’s script slightly to launch dummy worker processes (they just sleep for ten seconds) on the compute nodes, that then write out a file to my homedir on the NFS share. Looks like running one command via ssh on each desired node will more or less accomplish some of the sorts of things we intend to accomplish. Not that this is shocking, I just wanted to convince myself that I could do it.

Meeting liveblog

Monday, September 21st, 2009

7:17PM: We are meeting.
7:18PM: We are talking about what we want to accomplish.
7:19PM: John is testing the use of CoDeploy on the cluster as a potential tool for deploying our code across the cluster.
7:27PM: We are discussing the feasibility of leaving Erlang interpreters idling on cluster nodes.
7:47PM: We are getting caught up doing weird things with ssh-agent.
7:51PM: We have resolved to meet with Josiah on Wednesday.