 |
Grid Computing is an exciting buzzword in the computing
world today. Here we define it to mean the exploitation of a
varied set of networked computing resources, including large or
small computers, PDAs, file servers and graphics devices. The
networks could be anything from high speed ATM to wireless or
modem connections. Exploiting these connected resources could,
for example, enable large scale simulations not possible on a
single supercomputer, aid computational work of geographically
distributed collaborations, simplify remote use of machines, and
enable the new dynamic application scenarios we propose. Two
important aspects of Grid technology, which have been largely
ignored, form the basis of our GridLab Project, which
aims to build components for Grid applications (as MatLab does
for mathematics), and realistic testbeds for their development:
- Co-development of Infrastructure and Applications:
We propose a balanced program with co-development of a
range of Grid applications (based on
Cactus, the leading, widely used
Grid-enabled open source application framework, and
Triana, a dataflow framework used in
gravitational wave research) alongside infrastructure
development, working on transatlantic testbeds of varied
supercomputers and clusters. This practical approach
ensures that the developed software truly enables easy and
efficient use of Grid resources in a real
environment. We will maintain and upgrade the testbeds
through deployment of new infrastructure and large scale
application technologies as they are developed. All
deliverables will be immediately prototyped and continuously
field tested by several user
communities. Our focus on
specific application frameworks allows us immediately
to create working Grid applications to gain experience
for more generic components developed during the project.
- Dynamic Grid Computing:
We will develop capabilities for simulation and
visualization codes to be self aware of the changing
Grid environment, and to be able to fully exploit dynamic
resources for fundamentally new and innovative applications
scenarios. For example, the applications themselves will
possess the capability to migrate from site to site during
the execution, both in whole or in part, to spawn related
tasks, and to acquire/release additional resources demanded
by both the changing availabilities of Grid resources, and
the needs of the applications themselves.
Figure 1: GridLab architecture
We will unify these elements in developing innovative,
practical, Grid computing technologies, which will then be
quickly and easily adopted and exploited by applications from
many different research and engineering fields, as shown in
Figure 1. Specific key objectives of our project
are to:
- Design and develop a Grid Application Toolkit (GAT), to
provide core, easy to use functionality through a carefully
constructed set of generic APIs for both simulation codes and
Grid software. The GAT will contain independent modules for handling
many different aspects of Grid programming, including simulation,
performance and grid monitoring, resource brokering and selecting,
performance prediction, interaction with information servers,
security, notification, collaboration, data handling, remote
visualization, and remote application steering.
- Simultaneously enhance real applications for the
Grid, implementing new dynamic simulation scenarios using
the GAT. Both Cactus and Triana will be extended
to integrate and exploit GAT elements, making Grid Computing
easily exploitable by a wide range of applications. Our
simulation driven, compute intensive applications are
fundamentally different from the highly data driven
applications in many other Grid projects (e.g., DataGrid,
GriPhyN, EuroGrid).
- Develop and test Grid infrastructure/applications on
real testbeds, constructed by linking heterogeneous
collections of supercomputers and other resources spanning
Europe and the USA, using and extending existing testbeds.
Interoperability with different testbeds will be ensured by
also using production testbeds in the USA, driving
international high speed network connectivity. Testing will
be carried out by the project and by several large, closely
related user communities, including an EU Astrophysics
Network, and various multidisciplinary US funded
collaborations.
User Scenarios
The end technology developed through this project will enable
scenarios, such as the following hypothetical examples, to become
reality.
- Gravitational Wave Detection and Analysis:
The gravitational wave detector network, including GEO600 in
Germany, collects a TByte of data each day, which must be
searched using different algorithms for possible events such
as black hole or neutron star collisions, or pulsar signals.
|
Routine realtime analysis of gravitational wave
data from the Hanover detector identifies a burst
event, but this standard analysis reveals no
information about the burst location. To obtain the
location, desperately required by astrophysicists for
turning their telescopes to view the event before it
fades, a large series of templates must be
cross-correlated against the detector data.
An Italian astrophysicist accesses the
GEO600, and using the performance tool
finds that 3 TFlops/s is needed to analyze the 100GB
of raw data in the required hour. Local resources
are insufficient, so using the brokering
tool, she locates the fastest available machines
around the world. She selects five suitable
machines, and with scheduling and
data management tools, data is moved,
executables created and the analysis starts.
In an Amsterdam bar twenty minutes later, an SMS
message from the portal's notification tool,
informs her that one machine is overloaded, breaking
the runtime contract. She connects with her PDA
to the portal, and instructs the migration tool
to move this part of the analysis to a different
machine. Within the specified hour, a second SMS
message tells her analysis is finished, and the
resulting data is now on her local machine. Using this
location data, observatories are able to find and view
an exceptionally strong gamma-ray burst,
characteristic of a collision of neutron stars.
|
- Numerical Relativity:
A single simulation of an astrophysical event, e.g. black
hole or neutron star collisions, ideally requires over TByte
and TFlop resources, not yet available on a single machine.
|
Learning more about the detected burst requires
cross-correlating detector data with custom wave
templates from full-scale neutron star
simulations. Sufficiently accurate templates require
running large scale simulations too big to fit on any
current supercomputer.
German members of an international numerical
relativity collaboration are tasked with creating
collision templates for ten different neutron star
mass combinations. They access the web-based
Simulation Portal, selecting required code
modules and building parameter files with the
code composition tool. The
performance prediction tool estimates that
each simulation requires 1024 GigaBytes of memory and
100,000 GigaFlops, with an additional 500,000 GigaFlops
required for processing data to create signal
templates.
The brokering tool finds that no single
machine in the Simulation Testbed can supply
enough memory, but locates two machines which can be
connected to form a large enough virtual
supercomputer, the dynamic grid monitoring
tool indicates an acceptable bandwidth between
them. The scheduling tool stages the five
runs to appropriate queues on the machines, and the
first simulation starts.
The time-consuming task of creating templates is
handled by spawning simulations to
smaller machines dynamically located by the
broker, at each time step data is streamed to
a series of networked computers for analysis,
creating a simulation vector using available
machines on the grid.
Collaborators around
the world connect to the portal using networked
workstations, home PCs and modems, as well as the
latest wireless PDAs and mobile phones. They are
able to use various remote access tools to
visualize data, monitor
performance and simulation properties, and
interactively steer the simulation.
|
|