Maintenance Outage July 11th, 5pm till July 12th, noon.
On July 11th and 12th, IT will be performing several maintenance (network switches, storage and RStudio certificate renewal) and upgrade tasks (Scientific Compute Cluster) that will affect systems at the Institute.
This work will start at 5pm, Friday July 11th, and should be completed by noon on Saturday, July 12th.
1. Network:
Network switch code updates and reboots will take place starting Friday evening.
2. Storage:
The solexa_scrach share will be decommissioned.
If you have any data stored at /lab/solexa_scratch that you want to preserve please make a point of moving that data into your current lab shares or request for the data to be archived at the following url:
https://ithelp.wi.mit.edu/support/catalog/items/6
3. Compute Cluster:
Slurm Upgrade and Queue Changes
On the morning of July 12, the cluster scheduler software (SLURM) will be upgraded to the most recent version.
Ahead of this upgrade all jobs will be terminated.
Please note the upgrade may take several hours to complete due to our job database size.
When completed, IT will make a change to the cluster affecting the run time for individual jobs as they are submitted to general-purpose queues. This change will help ensure that stale or inactive jobs do not block cpu and memory resources for others.
Currently most cluster jobs are submitted without setting a run time. After the change, any job without a time set will have a default run time as listed in the table below. When you submit a job, you can use the "--time" parameter, up to a maximum number of days shown in the table.
Affected Slurm Queues
+------------------+------------------------+---------------------+
| Queue | Default Run Time(days) | Max Run Time (days) |
+------------------+------------------------+---------------------+
| 20 | 3 | 21 |
| 20x | 3 | 21 |
| 18 | 3 | 21 |
| nvidia-t4-20 | 3 | 21 |
| nvidia-A4000-20 | 3 | 21 |
| nvidia-2080ti-20 | 3 | 21 |
| nvidia-A6000-20 | 3 | 7 |
| nvidia-L40S-20 | 3 | 7 |
| nvidia-A100-20 | 3 | 7 |
+------------------+-------------------------+----------------+
Information on how to use "--time" can be found here: https://clusterguide.wi.mit.edu/using-the-slurm-cluster/#submitting-slurm-jobs
Time limits will be provided here after the change is complete:
http://slurmstatus.wi.mit.edu/limits.html
4. Maintenance
Rstudio.wi.mit.edu
In order to install a new certificate the rstudio pro server will be restarted during this maintenance window.
Any rstudio sessions running will be terminated.
If you have any questions or concerns please email us at ithelp@wi.mit.edu.
Thank you for your patience.
Sincerely,
Craig Andrew
Assistant Director of ISO