Site Loader

1
A Taxonomy and Survey of Grid Resource Planning and Reservation
Systems for Grid Enabled Analysis Environment
Arshad Ali3
, Ashiq Anjum3
, Atif Mehmood3
, Richard McClatchey2
, Ian Willers2
, Julian Bunn1
, Harvey
Newman1
, Michael Thomas1
, Conrad Steenberg1
1
California Institute of Technology
Pasadena, CA 91125, USA
Email: {conrad,newman}@hep.caltech.edu, [email protected], [email protected] 2
CERN
Geneva, Switzerland
Email: [email protected] 2
University of the West of England
Bristol, UK
Email: [email protected] 3
National University of Sciences and Technology
Rawalpindi, Pakistan
Email: {arshad.ali, ashiq.anjum, atif.mehmood}@niit.edu.pk
Abstract
The concept of coupling geographically
distributed resources for solving large
scale problems is becoming increasingly
popular forming what is popularly called
grid computing. Management of
resources in the Grid environment
becomes complex as the resources are
geographically distributed ,
heterogeneous in nature and owned by
different individuals and organizations
each having their own resource
management policies and different
access and cost models. There have been
many projects that have designed and
implemented the resource management
systems with a variety of architectures
and services. In this paper we have
presented the general requirements that a
Resource Management system should
satisfy. The taxonomy has also been
defined based on which survey of
resource management systems in
different existing Grid projects has been
conducted to identify the key areas
where these systems lack the desired
functionality.
1.0 Introduction
Today, Grid users have to transform
their high-level requirement into a
workflow of jobs that can be submitted
for execution on the Grid. Each job must
specify which files contain the code to
be run, selected by mapping the high
level requirements to available
application components and selecting a
physical file from the many available
replicas of the code in various locations.
The job also specifies the location (or
host) where it should be run, based on
the code requirements (e.g., code is
compiled for MPI, parallelized to run on
tightly-coupled architecture, preferably
with more than 5 nodes) and on user
access policies to computing and storage
resources. An executable workflow also
includes jobs to move input data and
application component files to the
execution location.
Current Grid management systems allow
the discovery of the available resources
and data location but the users have to
carry out all these steps manually. A
resource planning and reservation
system is thus required which can
automate the whole process of work
flow generation.
2
2.0 Planning and Reservation
Planning and reservation is an important
task to be performed by the Grid
Resource Management System. Planning
and Reservation is the process of
analyzing the job and determining the
resources required for successful
execution of the job. Based on these
results resources are reserved seamlessly
to the user.
2.1 Requirements for planning
and reservation
Resource management is a complex task
involving security and fault tolerance
along with scheduling. It is the manner
in which resources are allocated,
assigned, authenticated, authorized,
assured, accounted, and audited.
Resources include traditional resources
like compute cycles, network bandwidth,
space or a storage system and also
services like data transfer, simulation
etc.
Following are the requirements that a
Grid RMS (Resource Management
System) must satisfy in order to perform
resource planning and reservation:
• A Grid RMS needs to schedule
and control the resources on any
element in the network
computing system environment.
• Grid RMS should predict the
impact that an application’s
request will have on the overall
resource pool and quality of
service guarantees already given
to other applications.
• Grid RMS should preserve site
autonomy. Traditional resource
management systems work under
the assumption that they have
complete control on the resource
and thus can implement the
mechanisms and policies needed
for effective use of that resource.
But the Grid resources are
distributed across separate
administrative domains. This
results in resource heterogeneity,
differences in usage, scheduling
policies and security
mechanisms.
• Grid RMS must ensure Coallocation
of the resources. Coallocation
is the problem of
allocating resources in different
sites to an application
simultaneously.
• Different administrative domains
employ different local resource
managements systems like NQE,
LSF etc. A grid RMS should be
able to interface and interoperate
with these local resource
management systems.
• In a Grid system resources are
added and removed dynamically.
Different types of applications
with different resource
requirements are executed on the
Grid. Resource owners set their
own resource usage policies and
costs. This necessitates a need for
negotiation between resource
users and resource providers so a
grid RMS should enable such
negotiation.
• The resource management
framework should allow new
policies to be incorporated into it
3
without requiring substantial
changes to the existing code.
• The Grid RMS is also
responsible for ensuring the
integrity of the underlying
resource and thus enforces the
security of resources. The
resource management system
must operate in conjunction with
a security manager.
3.0 Taxonomy
The taxonomy followed by us is based
on the architecture of the planning and
reservation system. Based on this
taxonomy we have surveyed and
classified various grid projects.
Different attributes in the taxonomy aim
to differentiate RMS implementations
according to the impact on overall Grid
system scalability and reliability thus
classification of RMS is based on grid
type, resource namespace, resource
information (discovery, dissemination),
scheduling model and scheduling policy.
3.1 Grid Type
Grid systems are classified as Compute,
Data and Service grids as shown in
figure 2. The computational Grid
category denotes the systems that have a
higher aggregate computational capacity
available for single applications than the
capacity of any constituent machine in
the system. The major resource managed
by GRMS in compute grids is “Compute
Cycles”.
In Data Grids the resource management
system manages data distributed over
geographical locations. Data Grid is for
systems that provide an infrastructure for
synthesizing new information from data
repositories such as digital libraries or
Data Warehouses that are distributed in a
wide area network. The Service Grid
Figure 2
category is for the systems that provide
services that are not provided by any
single machine. This category is further
subdivided in On Demand,
Collaborative, and Multimedia Grid
Systems.
3.2 Resource Namespace
Resources in a grid are managed and
named by the Grid Resource
Management System; the naming of
resources effects others functions of
GRMS like resource discovery, resource
dissemination and also affects the
structure of the database storing resource
information. Different approaches to
naming are Relational, Hierarchical and
graph based.
A relational namespace divides the
resources into relations and uses the
concepts from relational databases to
indicate relationships between tuples in
different relations.
A hierarchical namespace divides the
resources in the grid into hierarchies
organized around the physical or logical
network structure of the grid i.e. it
follows a system of systems approach, a
name is constructed by traversing down
a hierarchy.
4
In Graph Based Naming resources are
linked together and a resource name is
constructed by following the links from
one object to another.
3.3 Resource Information
3.3.1 Resource Dissemination
Resource dissemination is the process of
advertising information about resources.
The protocols used for dissemination are
“periodic” and “on demand”. In periodic
resource dissemination the information
database is updated periodically so
update is not driven by resource status
change indeed all changes are batched
and updated in information database
after specific interval. On Demand
protocol updates the resource
information database as the change
occurs in the status of any of the
resource.
3.3.2 Resource Discovery
Resource management system performs
resource discovery to obtain information
about available resources. There are two
approaches to resource discovery namely
“agent based” and “query based”. In
Agent based approach agents traverse
the grid system to gather information
about resource availability. In Query
Based approach resource information
store is queried for resource availability.
3.4 Scheduling model
Scheduling model describes how
machines involved in resource
management make scheduling decisions.
Scheduling models normally used are
centralized and decentralized; in a
centralized model all jobs are submitted
to a single machine which is responsible
for scheduling them on available
resources. The problems with this
approach are that the single scheduler
will be single point of failure. It will also
affect scalability of the grid. In
decentralized model there is no central
scheduler, scheduling is done by the
resource requestors and owners
independently. This approach is scalable
and suits grid systems. But individual
schedulers should cooperate with each
other in making scheduling decisions.
3.5 Scheduling Policy
Scheduling policy governs how
resources are scheduled on the matched
resources. In a Grid environment there
can be no single global scheduling
policy, Different administrative domains
may set different resource usage
policies, so the RMS should allow for
the policies to be added or changed with
minimal overhead.
4.0 Survey
Resource management in Condor,
Globus, Legion, European Data Grid,
and Nimrod G has been surveyed,
keeping in view the above discussed
taxonomy.
4.1 Planning and reservation system
in Condor
The main function of Condor 4 is to
allow utilization of machines that
otherwise would be idle thus solving the
wait-while-idle problem. Condor uses
Classified Ads (which is a resource
specification language) to specify
resource requests. Through its unique
remote system call capabilities, Condor
preserves the job’s originating machine
environment on the execution machine,
5
even if the originating and execution
machines do not share a common file
system and/or user ID scheme. Condor
jobs with a single process are
automatically checkpointed and
migrated between workstations as
needed to ensure eventual completion.
Condor has a centralized scheduling
model. A machine (Central Manager) in
the Condor system is dedicated to
scheduling. Each Condor work station
submits the jobs in its local queue to the
central scheduler which is responsible
for finding suitable resources for the job
execution. The information about
suitable available resources to run the
job (execution machine information) is
returned to the job submission machine.
A shadow process is forked on the
submission machine for each job, which
is responsible for contacting and staging
the job on the execution machine and
monitoring its progress. Condor supports
pre-emption of running jobs, if the
execution machine decides to withdraw
the resources Condor can preempt the
job and schedule it on another machine
thus providing for resource owner
autonomy.
4.2 Planning and reservation system
in Globus
Globus provides software infrastructure
that enables applications to view
distributed heterogeneous computing
resources as a single virtual machine.
The toolkit consists of a set of
components that implement basic
services, such as security, resource
location, resource management, data
management, resource reservation, and
communications.
Planning and reservation system of
Globus consists of resource brokers,
resource co-allocators and resource
manager or GRAM. The resource
requests are specified in extensible
resource specification language (RSL).
Globus offers Grid information services
via an LDAP-based network directory
called Metacomputing Directory
Services (MDS). The Resource Brokers
discover resources by querying the
information service (MDS) for resource
availability. MDS consists of two
components Grid Index Information
service (GIIS) and Grid resource
information service (GRIS). GRIS
provides resource discovery services.
GIIS provides a global view of the
resources by pulling information from
the GIIS’s. Resource information on the
GIIS’s is updated by push dissemination.
Hierarchical name space organization is
followed in Globus for naming resources
and the scheduling model is
decentralized i.e. scheduling is done by
application level schedulers and resource
brokers. Co-allocator takes care of multirequests,
a multi request is a request
involving resources at multiple sites
which need to be used simultaneously,
and passes each component of the
request to appropriate resource manager
and then provides a means for
manipulating each resultant set of
managers as a whole. The Co-allocation
of resources is done by the DUROC
component of Globus.
The resource manager interacts with
local resource management systems to
actually schedule and execute the jobs.
The implementation of the resource
manager in Globus is called GRAM.
GRAM authenticates the resource
requests and schedules them on the local
6
resource manager. Each user is
associated with a UHE (user hosting
environment) on the execution machine.
All the jobs from a user are directed to
the user’s UHE, which starts up a new
Managed Job Factory service (MJFS)
instance for every job.
The MJFS communicated with the
clients by starting up two instances of
File Stream Factory Service (FSFS) for
standard input and output. MJFS and
FSFS are persistent services.
4.3 Planning and reservation system
in Legion
Legion 6 9 is an operating system for
the Grid that offers the infrastructure for
Grid computing. Scheduler in Legion
has a hierarchical structure. Users or
active objects in the system invoke
scheduling to run jobs, the higher level
scheduler schedules the job on cluster or
resource group while the local resource
manager for that domain schedules the
job on local resources. Scheduling in
Legion is done by placing objects on the
processors. The resource namespace is
graph based.
Information about resources in the grid
is stored in database object called a
collection. For scalability there could be
more than one collection object and
collections can send and receive data
from each other. Information is obtained
from resources either by pull or push
mechanism. Users or Schedulers query
the collection to obtain resource
information.
Legion supports resource reservation and
object persistence. When the scheduler
object contacts a host object (processor
or local resource management system),
the host returns a reservation token to
the scheduler if the job can be executed
on its resources.
Every object is associated with vault
object. Vault object holds associated
object’s Object Persistent Representation
(OPR). This ensures that even if the
object fails, it can later be re-constructed
from the OPR.
Communication between any two
objects goes through the Legion Protocol
stack which involves constructing
program graphs, making method
invocations, checking authorization,
assembling or disassembling messages,
encrypting, re-transmitting messages etc.
This frameworks allows for implicit
security and fault-tolerance
4.5. Planning and reservation system
in European Data Grid
EU Data grid was designed to provide
distributed scientific communities access
to large sets of distributed computational
and data resources. The main
architecture of the datagrid is layered.
The datagrid project develops datagrid
services and depends on the Globus
toolkit for core middleware services like
security. The datagrid services layer
consists of workload management
services which contain components for
distributed scheduling and resource
management, Data Management services
contains middleware infrastructure for
coherently managing information stores
and monitoring services provided enduser
and administrator access to status
information on the grid. The workload
management package consists of a user
interface, resource broker, job
submission service, book keeping and
7
logging service. A job request from user
is expressed in a Job Description
Language based on the Classified Ads of
Condor. The resource broker (RB) when
given a job description tries to find the
best match between the job requirements
and available resources on the grid,
considering also the current distribution
of load on the grid. RB interacts with
data replication and meta-data
information services to obtain
information about data location. The
information service is LDAP based
network directory. Resource discovery is
done by queries and employ periodic
push for dissemination. Global
namespace hierarchical and scheduling
is decentralized but instead of having a
resource broker for each end-user, each
virtual organization is provided resource
broker. It does not support advanced
reservation or co-allocation of resources.
It does not address failures originated by
jobs which it simply reports to end user.
But the state of the resource broker
queues and job submission service
queues is persistent and can be recovered
fully after a crash.
4.6. Planning and reservation system
in Nimrod-G and GRACE
Nimrod-G 7 is a Grid grid-enabled
resource management and scheduling
system based on the concept of
computational economy. It uses the
middleware services provided by Globus
Toolkit but can also be extended to other
middleware services.
Nimrod-G uses the MDS services for
resource discovery and GRAM APIs to
dispatch jobs over grid resources. The
users can specify the deadline by which
the results of there experiments are
needed. Nimrod-G broker tries to find
the cheapest resources available that can
do the job and meet the deadline.
Nimrod uses both static cost model
(stored in a file in the information
database) and dynamic cost model
(negotiates cost with the resource owner)
for resource access cost trade-off with
the deadline. GRACE provides
middleware services needed by the
resource brokers in dynamically trading
resources access costs with the resource
owners. It co-exists with other middleware
systems like Globus. The main
components of the GRACE
infrastructure are Trade Manager (TM),
trading protocols and Trade Server (TS).
TM is the GRACE client in the NimrodG
resource broker that uses the trading
protocols to interact with trade servers
and negotiate for access to resources at
low cost. Trade Server is the resource
owner agent that negotiates with
resource users and sells access to
resources. TS uses pricing algorithms as
defined by the resource owner that may
be driven by the demand and supply. It
also interacts with the accounting system
for recording resource usage.
It has an extensible application-oriented
scheduling policy and scheduler uses
theoretical and history based predictive
techniques for state estimation.
Scheduler organization is decentralized
and the namespace is hierarchical.
5.0. Conclusion
In this paper various issues in resource
planning and reservation have been
discussed. A taxonomy based on
architecture of grid resource
management system has been described.
Based on this taxonomy a survey of
existing planning and reservation
8
systems has been conducted and results
are presented.
References
1. Klaus Krauter, Rajkumar Buyya,
and Muthucumaru Maheswaran,
A Taxonomy and Survey of
Grid Resource Management
Systems for Distributed
Computing, International
Journal of Software: Practice
and Experience (SPE), ISSN:
0038-0644, Volume 32, Issue 2,
Pages: 135-164, Wiley Press,
USA, February 2002.
2. K. Czajkowski, I. Foster, N.
Karonis, C. Kesselman, S.
Martin, W. Smith, and S. Tuecke.
A resource management
architecture for
Metacomputing systems. In
Proceedings of the IPPS/SPDP
Workshop on Job Scheduling
Strategies for Parallel
Processing, pages 62–82, 1988.
3. Chaitanya Kandagatla : Survey
and Taxonomy of Grid
Resource Management Systems
4. Condor Team. Condor Manual.
Available from
http://www.cs.wisc.edu/condor/
manual, 2001.
5. Condor Team. The directed
acyclic graph manager.
http://www.cs.wisc.edu/condor/d
agman, 2002.
6. H. Dail, G. Obertelli, F. Berman,
R. Wolski, and Andrew
Grimshaw, Application-Aware
Scheduling of a
Magnetohydrodynamics
Application in the Legion
Metasystem, Proceedings of the
9th Heterogeneous Computing
Workshop, May 2000.
7. R. Buyya, D. Abramson, J.
Giddy, Nimrod/G: An
Architecture for a Resource
Management and Scheduling
System in a Global
Computational Grid,
International Conference on High
Performance Computing in AsiaPacific
Region (HPC Asia 2000),
Beijing, China. IEEE Computer
Society Press, USA, 2000.
8. W. Hoschek, J. Jaen-Martinez,
A. Samar, H. Stockinger, and K.
Stockinger, Data Management
in an International Data Grid
Project, Proceedings of the first
IEEE/ACM International
Workshop on Grid Computing,
(Springer Verlag Press,
Germany), India, 2000.
9. S. Chapin, J. Karpovich, A.
Grimshaw, The Legion
Resource Management System,
Proceedings of the 5th Workshop
on Job Scheduling Strategies for
Parallel Processing, April 1999. 

Post Author: admin

x

Hi!
I'm Matt!

Would you like to get a custom essay? How about receiving a customized one?

Check it out