LAM/MPI logo

What's New in LAM 7.0.6

  |   Home   |   Download   |   Documentation   |   FAQ   |  
LAM 7.0 provides a number of new features designed to improve both the user experience and performance of MPI applications. As with any MPI implementation, no changes to applications should be needed to take advantage of the new features introduced in 7.0.

MPI applications running under LAM/MPI can be checkpointed to disk and restarted at a later time. LAM requires a 3rd party single-process checkpoint/restart toolkit for actually checkpointing and restarting a single MPI process - LAM takes care of the parallel coordination. Currently, the
Berkeley Labs Checkpoint/Restart package (Linux only) is supported. The infrastructure allows for easy addition of new checkpoint/restart packages.

Parallel Debugging
Debugging applications is hard - especially parallel applications. With LAM 7.0, it is possible to use real parallel debuggers such as the from Streamline Computing and the from Etnus to debug LAM/MPI applications. TotalView, for example, allows users to view the unexpected receive queue of LAM/MPI, finding those "lost" messages in an application.

Myrinet Support
Myrinet networks provide greater bandwidth and lower latency than common ethernet networks such as Fast- or Gigabit Ethernet. With LAM 7.0, it is possible to take full advantage of the Myrinet network. Be sure to read the release notes in the User's Guide, as some tuning may be required for your application.

Run-time Tuning and RPI Selection
LAM has always supported a wide number of tuning parameters. Unfortunately, most could only be set at compile time, leading to painful application tuning. With LAM 7.0, almost every parameter in LAM can be altered at run-time -- Environment variables or flags to mpirun make tuning much simpler. The addition of
SSI allows RPI selection to be made at run-time rather than compile-time. Rather than recompiling LAM 4 times to decide which transport gives best performance for an application, all that is required is a single flag to mpirun.

SMP-Aware Collectives
The use of clusters of SMP machines is a growing trend in the clustering world. With LAM 7.0, many common collective operations are optimized to take advantage of the higher communication speed between processes on the same machine. When using the SMP-aware collectives, performance increases can be seen with little or no changes in user applications. Be sure to read the LAM User's Guide for important information on exploiting the full potential of the SMP-aware collectives.

Integration with PBS
PBS (either
OpenPBS or PBS Pro) provides scheduling services for many of the high performance clusters in service today. By using the PBS-specific boot mechanisms, LAM is able to provide process accounting and job cleanup to MPI applications. As an added bonus to MPI users, lamboot execution time is drastically reduced when compared to RSH.

Integration with BProc
BProc distributed process space provides a single process space for an entire cluster. It also provides a number of mechanisms for starting applications not available on the compute nodes of a cluster. LAM's BProc support supports booting under the BProc environment, even when LAM is not installed on the compute nodes -- LAM will automatically migrate the required support out to the compute nodes. MPI applications still must be available on all compute nodes (although the -s option to mpirun eliminates this requirement).

Globus Enabled
LAM 7.0 includes beta support for execution in the
Globus Grid environment. Be sure to read the release notes in the User's Guide for important restrictions on your Globus environment.

Extensible Component Architecture
LAM 7.0 is the first LAM release to include the System Services Interface (SSI), providing an extensible component architecture for LAM/MPI. Currently, "drop-in" modules are supported for booting the LAM run-time environment, MPI collectives, Checkpoint/Restart, and MPI transport (RPI). Selection of a component is a run-time decision, allowing for user selection of the modules that provide the best performance for a specific application.