GENIUS HARC

From RealityGrid

Jump to: navigation, search

HARC - the Highly-Available Resource Co-allocator - is a system for reserving multiple resources in a coordinated fashion. HARC can handle multiple types of resource, and has been used to reserve time on supercomputers distributed across a nationwide testbed in the United States, together with dedicated lightpaths connecting the machines.

In the context of GENIUS, HARC will be used to provide co-allocation of compute resources in TeraGrid, LONI, NW-Grid and NGS so that the HemeLB code can be run across these resources, in a meta-computing style. Co-allocation of HPCx & HECToR with HARC might also be possible in the future. This page will show the status of the HARC deployment on these reousrces, and will be updated frequently. For information about the deployment on HemeLB on those resources, see HemeLB Deployment. For information about the integration of HARC into the AHE see AHE GENIUS. More background on HARC is given at the bottom of this page, including information about obtaining the HARC Client.

[edit] Deployment Status

[edit] At a Glance

See the notes below for details on the column meanings.


Infrastructure Machine HARC Pre-WS GRAM WS GRAM JDD Extensions MPI-g
TeraGrid SDSC IA-64 Deployed jobmanager-pbs_gcc_resid PBS Yes Yes
NCSA IA-64 Deployed jobmanager-pbs PBS Yes Yes
NCSA Abe No jobmanager-pbs PBS Unknown Unknown
UC/ANL IA-64 No Unknown Unknown No Yes
TACC Lonestar No jobmanager-lsf LSF Yes Unknown
TACC Ranger Not yet in service; friendly user mode due to start in Dec '07; will not be available for SC'07.
LONI QueenBee x86 50% of machine to become available on TeraGrid in Jan 2008; will not be available for SC'07.
LONI "Bluedawg" IBM 575 Deployed jobmanager-loadleveler Loadleveler Yes Yes
"Ducky" IBM 575 Deployed jobmanager-loadleveler Loadleveler Yes Yes
"Zeke" IBM 575 Deployed jobmanager-loadleveler No Yes Yes
"Neptune" IBM 575 Machine not open to general users; not expected for SC'07 use.
"LaCumba" IBM 575 Machine undergoing installation; not expected for SC'07 use.
"Eric" x86 Cluster No (In Progress) jobmanager-pbs PBS Yes No
Other x86 Clusters There are five more clusters (same spec as Eric); will not be available for SC'07.
NW-Grid Manchester (man1) Dummy RM No No No No
Manchester (man2) Test RM Deployed Yes No No No
Lancaster Test RM Deployed No No No No
Liverpool Dummy RM No No No No
Daresbury Dummy RM No No No No
NGS Leeds NGS2 Deployed jobmanager-pbs No Unknown Yes
Manchester NGS2 Deployed jobmanager-pbs PBS (port 9443) Unknown Yes
Oxford NGS2 Deployed jobmanager-pbs No Unknown Yes
STFC-RAL NGS2 No No No Unknown No
HPCx Deployed jobmanager-loadleveler No yes No
HECToR No jobmanager-pbs No yes No


[edit] Notes on Table

The following table summarises the deployment status of HARC. The columns for "Pre-WS GRAM" and "WS-GRAM" indicate the ability for users to submit jobs to reservations via those Globus mechanisms. The "MPI-g" column indicates whether or not MPI-g is available on each resource.

[edit] HARC Status
The "HARC Status" Column indicates whether or not HARC can be used to make reservations on that resource. In order to use HARC to create reservations on these resources, you will need to add the following mappings into your harc.properties file:
# LONI
harc.client.rm.simple_compute.bluedawg.loni.org=https://bluedawg.loni.org:9393/bluedawg-rm
harc.client.rm.simple_compute.ducky.loni.org=https://ducky.loni.org:9393/ducky-rm
harc.client.rm.simple_compute.zeke.loni.org=https://zeke.loni.org:9393/zeke-rm
# NWG
harc.client.rm.simple_compute.man2.nw-grid.ac.uk=https://man2.nw-grid.ac.uk:9393/man2-rm
harc.client.rm.simple_compute.lancs1.nw-grid.ac.uk=https://lancs1.nw-grid.ac.uk:9393/lancs1-rm
# TeraGrid
harc.client.rm.simple_compute.tg-login2.sdsc.edu=https://tg-login2.sdsc.edu:9393/sdsctg-rm
harc.client.rm.simple_compute.grid-hg.ncsa.teragrid.org=https://grid-hg.ncsa.teragrid.org:9393/ncsa-dtf-rm
# NGS2
harc.client.rm.simple_compute.vidar.ngs.manchester.ac.uk=https://192.84.75.129:9393/vidar-rm
harc.client.rm.simple_compute.ngs.oerc.ox.ac.uk=https://ngs.oerc.ox.ac.uk:9393/oxngs2-rm
harc.client.rm.simple_compute.ngs.leeds.ac.uk=https://ngs.leeds.ac.uk:9393/leeds-ngs2-rm
More mappings will be added here as more HARC Resource Managers are deployed.
You can get the HARC Client Bundle here.
[edit] Pre-WS GRAM Integration
This is needed to get jobs submitted to the reservations that HARC makes. Pre-WS GRAM support is needed before WS GRAM can work. If deployed, this column will contain the name of the jobmanager that can be used to access the reservations. In all cases the reservations are accessed by the RSL term "reservation_id".
There is a document describing how to modify Globus to support this.
[edit] WS GRAM Integration
WS GRAM is the system that is to be used by the AHE to submit the jobs to the HARC reservations. To make this work correctly, the pre-WS jobmanager has to be configured to be visible through the GT4 Managed Job Factory Service (MJFS). If deployed, this column will contain the name of the GRAM ResourceID (jobmanager equivalent) that should be used to access the reservations. (The Job Description Extensions Support needs to be available too - see below).
Once in place, the HARC reservations are accessed by placing the following XML inside the job description:
<extensions>
    <reservation_id>ID from HARC</reservation_id>
</extensions>
There is a document describing how to modify Globus to support this.
[edit] JDD Extensions Support
Globus 4.0.5 is the first version of Globus to support the JDD Extensions required to support the specification of a HARC reservation ID in a WS GRAM job. For earlier versions of 4.0, there is a package that can be installed to add support for this.
[edit] MPI-g
MPI-g, or MPICH-G4, is not part of the Globus Toolkit. Rather it is a separate piece of software which must be installed. It is required for running the HemeLB code across multiple resources.


[edit] Notes on Testbeds

[edit] TeraGrid
Initially, HARC will be deployed on a subset of TeraGrid resources, at SDSC, NCSA and TACC. For more information on co-allocation on TeraGrid, and the support for this, see the TeraGrid Scheduling WG pages.
There is a page there which shows pre-WS GRAM and WS GRAM support for submission to reservations on TeraGrid resources.
[edit] SDSC HARC
There are two subtle difference with using HARC/Globus at SDSC:
1. In addition to the RM specification, you MUST specify your project ID in the harc.properties file
harc.client.project.simple_compute.tg-login2.sdsc.edu=PROJECT_NAME
2. Note that the HARC service runs on a different node from the Globus installation
harc-reserve -c tg-login2.sdsc.edu/4 -s 12:00 -d 1:00
globus-job-submit tg-login.sdsc.teragrid.org/jobmanager-pbs_gcc_resid -x "(reservation_id=1189016250)" -m 5 /bin/date
[edit] TeraGrid MPI-g
The MPI-g installation on the NCSA and SDSC TeraGrid resources is not system wide...
[edit] LONI
HARC will be deployed on all publicly useable nodes in the LONI Infrastructure. Currently, there is a trial deployment underway on bluedawg, ducky and zeke. This will shortly be extended to cover the new x86 cluster "Eric", and then onto "Queenbee", the most powerful LONI supercomputer (#23 on the June 2007 Top 500 List).
The LONI Documentation Site has more information on the infrastructure as a whole, and specific information about the IBM 575 Clusters and the 64 Bit Linux x86 Clusters. Also see the GENIUS LONI Page.
[edit] NW-Grid
HARC has already been deployed on the NW-Grid, a regional Grid in the North-West of England. At the moment, most of the machines run Sun GridEngine 6.0 as their scheduler, which does not support user-settable Advance Reservations, and so only dummy HARC Resource Managers are deployed there. It is anticipated that each NW-Grid node will support AR in the near future. The man2 installation is a 2-node deployment of Torque/Maui, which is available for testing purposes.
[edit] NGS
NGS is the UK National Grid Service. HARC is currently being evaluated by the ETF, which a pre-requisite to a full deployment. A trial deployment is underway on the new NGS2 resources at Manchester, Oxford and RAL.
The installation of HARC on Manchester's NGS2 node has been documented here.
[edit] HPCx
The outgoing UK National Service machine, HPCx, runs LoadLeveler. HPCx is now running a limited HARC service in test mode. MPIg has been compiled for this system but needs to be tested against the port forwarding infrastructure used to give access to the back end nodes.
[edit] HECToR
HECToR is the new UK National Service machine, It is currently being integrated with the NGS. Hector is a CrayXT4 running CNL and will need additional mechanisms to allow network access to the backend nodes.

[edit] Background on HARC

HARC was developed by Jon MacLaren at the Center for Computation & Technology at LSU. The main place for information, documentation and software relating to HARC is http://www.cct.lsu.edu/~maclaren/HARC including the HARC Client Bundle. The current client version is V1.9.3, which includes better support for the new variable RM timeouts; see the current CHANGES file.

Personal tools
projects