GENIUS HARC
From RealityGrid
HARC - the Highly-Available Resource Co-allocator - is a system for reserving multiple resources in a coordinated fashion. HARC can handle multiple types of resource, and has been used to reserve time on supercomputers distributed across a nationwide testbed in the United States, together with dedicated lightpaths connecting the machines.
In the context of GENIUS, HARC will be used to provide co-allocation of compute resources in TeraGrid, LONI, NW-Grid and NGS so that the HemeLB code can be run across these resources, in a meta-computing style. Co-allocation of HPCx & HECToR with HARC might also be possible in the future. This page will show the status of the HARC deployment on these reousrces, and will be updated frequently. For information about the deployment on HemeLB on those resources, see HemeLB Deployment. For information about the integration of HARC into the AHE see AHE GENIUS. More background on HARC is given at the bottom of this page, including information about obtaining the HARC Client.
[edit] Deployment Status
[edit] At a Glance
See the notes below for details on the column meanings.
| Infrastructure | Machine | HARC | Pre-WS GRAM | WS GRAM | JDD Extensions | MPI-g |
|---|---|---|---|---|---|---|
| TeraGrid | SDSC IA-64 | Deployed | jobmanager-pbs_gcc_resid | PBS | Yes | Yes |
| NCSA IA-64 | Deployed | jobmanager-pbs | PBS | Yes | Yes | |
| NCSA Abe | No | jobmanager-pbs | PBS | Unknown | Unknown | |
| UC/ANL IA-64 | No | Unknown | Unknown | No | Yes | |
| TACC Lonestar | No | jobmanager-lsf | LSF | Yes | Unknown | |
| TACC Ranger | Not yet in service; friendly user mode due to start in Dec '07; will not be available for SC'07. | |||||
| LONI QueenBee x86 | 50% of machine to become available on TeraGrid in Jan 2008; will not be available for SC'07. | |||||
| LONI | "Bluedawg" IBM 575 | Deployed | jobmanager-loadleveler | Loadleveler | Yes | Yes |
| "Ducky" IBM 575 | Deployed | jobmanager-loadleveler | Loadleveler | Yes | Yes | |
| "Zeke" IBM 575 | Deployed | jobmanager-loadleveler | No | Yes | Yes | |
| "Neptune" IBM 575 | Machine not open to general users; not expected for SC'07 use. | |||||
| "LaCumba" IBM 575 | Machine undergoing installation; not expected for SC'07 use. | |||||
| "Eric" x86 Cluster | No (In Progress) | jobmanager-pbs | PBS | Yes | No | |
| Other x86 Clusters | There are five more clusters (same spec as Eric); will not be available for SC'07. | |||||
| NW-Grid | Manchester (man1) | Dummy RM | No | No | No | No |
| Manchester (man2) | Test RM Deployed | Yes | No | No | No | |
| Lancaster | Test RM Deployed | No | No | No | No | |
| Liverpool | Dummy RM | No | No | No | No | |
| Daresbury | Dummy RM | No | No | No | No | |
| NGS | Leeds NGS2 | Deployed | jobmanager-pbs | No | Unknown | Yes |
| Manchester NGS2 | Deployed | jobmanager-pbs | PBS (port 9443) | Unknown | Yes | |
| Oxford NGS2 | Deployed | jobmanager-pbs | No | Unknown | Yes | |
| STFC-RAL NGS2 | No | No | No | Unknown | No | |
| HPCx | Deployed | jobmanager-loadleveler | No | yes | No | |
| HECToR | No | jobmanager-pbs | No | yes | No | |
[edit] Notes on Table
The following table summarises the deployment status of HARC. The columns for "Pre-WS GRAM" and "WS-GRAM" indicate the ability for users to submit jobs to reservations via those Globus mechanisms. The "MPI-g" column indicates whether or not MPI-g is available on each resource.
[edit] HARC Status
- The "HARC Status" Column indicates whether or not HARC can be used to make reservations on that resource. In order to use HARC to create reservations on these resources, you will need to add the following mappings into your harc.properties file:
# LONI harc.client.rm.simple_compute.bluedawg.loni.org=https://bluedawg.loni.org:9393/bluedawg-rm harc.client.rm.simple_compute.ducky.loni.org=https://ducky.loni.org:9393/ducky-rm harc.client.rm.simple_compute.zeke.loni.org=https://zeke.loni.org:9393/zeke-rm # NWG harc.client.rm.simple_compute.man2.nw-grid.ac.uk=https://man2.nw-grid.ac.uk:9393/man2-rm harc.client.rm.simple_compute.lancs1.nw-grid.ac.uk=https://lancs1.nw-grid.ac.uk:9393/lancs1-rm # TeraGrid harc.client.rm.simple_compute.tg-login2.sdsc.edu=https://tg-login2.sdsc.edu:9393/sdsctg-rm harc.client.rm.simple_compute.grid-hg.ncsa.teragrid.org=https://grid-hg.ncsa.teragrid.org:9393/ncsa-dtf-rm # NGS2 harc.client.rm.simple_compute.vidar.ngs.manchester.ac.uk=https://192.84.75.129:9393/vidar-rm harc.client.rm.simple_compute.ngs.oerc.ox.ac.uk=https://ngs.oerc.ox.ac.uk:9393/oxngs2-rm harc.client.rm.simple_compute.ngs.leeds.ac.uk=https://ngs.leeds.ac.uk:9393/leeds-ngs2-rm
- More mappings will be added here as more HARC Resource Managers are deployed.
- You can get the HARC Client Bundle here.
[edit] Pre-WS GRAM Integration
- This is needed to get jobs submitted to the reservations that HARC makes. Pre-WS GRAM support is needed before WS GRAM can work. If deployed, this column will contain the name of the jobmanager that can be used to access the reservations. In all cases the reservations are accessed by the RSL term "reservation_id".
[edit] WS GRAM Integration
- WS GRAM is the system that is to be used by the AHE to submit the jobs to the HARC reservations. To make this work correctly, the pre-WS jobmanager has to be configured to be visible through the GT4 Managed Job Factory Service (MJFS). If deployed, this column will contain the name of the GRAM ResourceID (jobmanager equivalent) that should be used to access the reservations. (The Job Description Extensions Support needs to be available too - see below).
- Once in place, the HARC reservations are accessed by placing the following XML inside the job description:
<extensions>
<reservation_id>ID from HARC</reservation_id>
</extensions>
[edit] JDD Extensions Support
- Globus 4.0.5 is the first version of Globus to support the JDD Extensions required to support the specification of a HARC reservation ID in a WS GRAM job. For earlier versions of 4.0, there is a package that can be installed to add support for this.
[edit] MPI-g
- MPI-g, or MPICH-G4, is not part of the Globus Toolkit. Rather it is a separate piece of software which must be installed. It is required for running the HemeLB code across multiple resources.
[edit] Notes on Testbeds
[edit] TeraGrid
- Initially, HARC will be deployed on a subset of TeraGrid resources, at SDSC, NCSA and TACC. For more information on co-allocation on TeraGrid, and the support for this, see the TeraGrid Scheduling WG pages.
- There is a page there which shows pre-WS GRAM and WS GRAM support for submission to reservations on TeraGrid resources.
[edit] SDSC HARC
- There are two subtle difference with using HARC/Globus at SDSC:
- 1. In addition to the RM specification, you MUST specify your project ID in the harc.properties file
harc.client.project.simple_compute.tg-login2.sdsc.edu=PROJECT_NAME
- 2. Note that the HARC service runs on a different node from the Globus installation
harc-reserve -c tg-login2.sdsc.edu/4 -s 12:00 -d 1:00 globus-job-submit tg-login.sdsc.teragrid.org/jobmanager-pbs_gcc_resid -x "(reservation_id=1189016250)" -m 5 /bin/date
[edit] TeraGrid MPI-g
- The MPI-g installation on the NCSA and SDSC TeraGrid resources is not system wide...
[edit] LONI
- HARC will be deployed on all publicly useable nodes in the LONI Infrastructure. Currently, there is a trial deployment underway on bluedawg, ducky and zeke. This will shortly be extended to cover the new x86 cluster "Eric", and then onto "Queenbee", the most powerful LONI supercomputer (#23 on the June 2007 Top 500 List).
- The LONI Documentation Site has more information on the infrastructure as a whole, and specific information about the IBM 575 Clusters and the 64 Bit Linux x86 Clusters. Also see the GENIUS LONI Page.
[edit] NW-Grid
- HARC has already been deployed on the NW-Grid, a regional Grid in the North-West of England. At the moment, most of the machines run Sun GridEngine 6.0 as their scheduler, which does not support user-settable Advance Reservations, and so only dummy HARC Resource Managers are deployed there. It is anticipated that each NW-Grid node will support AR in the near future. The man2 installation is a 2-node deployment of Torque/Maui, which is available for testing purposes.
[edit] NGS
- NGS is the UK National Grid Service. HARC is currently being evaluated by the ETF, which a pre-requisite to a full deployment. A trial deployment is underway on the new NGS2 resources at Manchester, Oxford and RAL.
- The installation of HARC on Manchester's NGS2 node has been documented here.
[edit] HPCx
- The outgoing UK National Service machine, HPCx, runs LoadLeveler. HPCx is now running a limited HARC service in test mode. MPIg has been compiled for this system but needs to be tested against the port forwarding infrastructure used to give access to the back end nodes.
[edit] HECToR
- HECToR is the new UK National Service machine, It is currently being integrated with the NGS. Hector is a CrayXT4 running CNL and will need additional mechanisms to allow network access to the backend nodes.
[edit] Background on HARC
HARC was developed by Jon MacLaren at the Center for Computation & Technology at LSU. The main place for information, documentation and software relating to HARC is http://www.cct.lsu.edu/~maclaren/HARC including the HARC Client Bundle. The current client version is V1.9.3, which includes better support for the new variable RM timeouts; see the current CHANGES file.
