: A Reliable Grid Service InfrastructureThe Grid is dynamic by nature, with nodes shutting down respectively coming up again. The same holds for connections. For long running compute-intensive applications fault-tolerance is a major concern. A benefit of the Grid is that in case of a failure an application may be migrated and restarted on another site from a checkpoint file. But a migration framework cannot support fault-tolerant applications, if it is not fault-tolerant itself.
The Migol project is aimed to investigate the design and implementation of an fault-tolerant infrastructure, i.e. without any single point of failure, conforming to the Open Grid Service Architecture (OGSA) for supporting the migration of parallel MPI applications. OGSA builds upon open Web service technologies to uniformly expose Grid resources as Grid services. The Open Grid Services Infrastructure (OGSI) is the technical specification of extensions to the Web services technology fulfilling the requirements described by OGSA. An implementation of the OGSI is provided by the Globus Toolkit 3 and 4 (GT3 resp. GT4).
Migol is a framework consisting of a set of Grid services and libraries that provides fault-tolerant services to Grid applications. Migol services are currently built on top of GT3. The key issues of Migol are

Migol: A Fault-Tolerant Service Framework for MPI Applications in the Grid
André Luckow, Bettina Schnor
in Proceedings of EuroPVMMPI 2005 and Lecture Notes in Computer Science
publications
A grid application is a parallel application running on several parallel computers at different geographically distributed sites. In cooperation with the Max-Planck-Institut for Gravitational Physics dynamic load balancing and migration for grid applications are investigated.
In order to make better and more flexible use of computational resources, parallel codes can be run in grid environments using distributed MPI implementations. However, performing such runs can be very cumbersome and demanding and mostly leads to a very poor performance. This thesis tries to develop techniques to improve this situation.
A first successful demonstration of the application of those techniques in a large scaled high performance run was done in april last year. The results of this demonstration run supported the funding of the Tera Grid.
They also led to further development of effective techniques for distributed computing, for which the Gordon Bell Prize was awarded. The implementation of the new algorithms was done within Cactus code using the Globus metacomputing toolkit.
Many developed ideas arose from a highly collaborative effort between Max-Planck-Institute for Gravitational Physics, Argonne National Labs and National Center for Supercomputing Applications.
Distributed Computations in a Dynamic, Heterogeneous Grid Environment
Thomas Dramlitsch
PhD thesis
publications
Many large scale simulation in the field of hydrodynamics or meteorological modeling have compute time requirements, which go way beyond the queue time limits of the compute hosts. The runtime may not even be predictable at the start of the simulation.
This limitation requires the researcher to start the tedious process of securing the simulation's checkpoint files and archiving them if necessary. If the simulation is continued on another host, checkpoint files need to be transferred and the simulation is resubmitted to the queuing system.
At all steps, the user's manual interaction makes the process prone to failure:
Checkpoint files can be erased by disk quota time outs, the manual transfer of data takes long and resource
requirements have to be correctly analyzed. Last but not least, the researcher is required to remember usernames
and passwords as well as the interfaces to a wide range of different machines, architectures, queuing systems
and shell programs.
The process of resubmitting to an arbitrary host, which fulfills minimum resource requirements, is a prime candidate for automation.
Such a process is illustrated in the Fig. below, which shows the time line of a migrating application (shaded area) on three different hosts A, B and C. This host may have diverse types, here we sketched a cluster of workstations, a cluster and a traditional supercomputer.
The simulation is started on host A and the migration server (not shown) receives information on the application's resource consumption and location of the application. The migration server will monitor the availability of new machines, which will meet the application's requirements and boost its performance.
As "better" resources become available, the simulation is informed and checkpoints. The checkpoint files are transferred to host B, where the simulation is restarted. As the application runs out of compute time on B the checkpoint data is archived in a storage facility and the simulation is resubmitted to the queue on the same machine.
Some advanced applications are able to receive the checkpoint as a socket stream instead of reading from file. In combination with advanced reservation scheduling, this transfer mode allows for fast checkpoint transfer, shown in the migration to machine C: The application on B is aware of the expiration of its queue time limit and requests a slot on machine C, which overlaps with the compute slot on B. By the time the application is about to finish on B, the migration server starts an uninitialized simulation on C, which receives the simulation state through the streamed checkpoint and continues the calculation.
In any case, the server must determine the possible file transfer methods for each of the source and target machine, since it will stage the executable and possibly a checkpoint to the new host. The migration server is also responsible for submitting the new job to the queuing system. Previously written output must be transferred from the old host to a user given destination.
In a nomadic migration scenario, the application profiles its performance and provides data such as disc usage or maximum memory consumption. Since the application is moderately aware by which time the queue time is about to expire, it will engage in a checkpointing procedure and inform a migration server about the upcoming migration.
The migration server receives information on the checkpoint files, executable and resource requirement.
It will look up an appropriate resource, stage checkpoint and executable to the new host and submit a new job. The appropriate communication and queue system language is automatically chosen and previously written output data is transferred to a user given destination.
By taking advantage of parallelism in the application workflow, the application is split up in a set of sub-applications. each sub-application is executed on a resource, which is ``best'' qualified for that type of task.
At runtime, the spawning application determines, which sub-tasks in its workflow do not feed back into the main simulation flow and can therefore be spawned.
The simulation writes a reduced checkpoint, which only contains those data structures, which are necessary to complete the spawned tasks. For this reason, spawned sub-jobs can be executed with reduced memory requirements.
The checkpoint files and information on the executables are used to restart the subjob on another resource.
The new resource can be chosen to best match the characteristics of the spawned subjobs, e.g. in terms processing architecture + power, disk capacity, etc.
Nomadic Migration - A Service Environment for Autonomic Computing on the Grid
Gerd Lanfermann
PhD thesis
publications
Various Grid tools have been developed or adapted at the University Potsdam to ease the development and deployment of Grid applications.

In the scope of Grid Computing the University Potsdam, Department for Operation and Distributed Systems participates in a number of projects. The University Potsdam is actively involved in the following Grid Computing projects and initiatives: