2002-12-23 at 20:51
Informational; there is a reconfiguration of the routing, external to PDC but nevertheless between PDC and the rest of the world, to start at 2003-01-09, 20:00 UTC. The reconfiguration window is set to nine hours.
2002-12-20 at 12:28
Informational for pdc-users with data in cell Maintenance work on the AFS servers at Nada will be done on 8 January starting at 6 pm. Most UNIX computers at Nada will be inoperable during this time. Other services, like E-mail and WWW servers, will also be affected.
2002-12-18 at 13:15 [xxx (SBC / CBR)]
The SBC cluster login node has crashed again. Since this was the second time within a rather short time interval, the node will be temporarily closed for login during investigations. SBC cluster users are recomended to use or as login nodes until 20030106.
2002-12-18 at 13:15 [xxx (SBC / CBR)]
The SBC cluster login node scratchy crashed 12:45. It has been restarted again.
2002-12-12 at 17:39 [xxx (strindberg)]
Strindberg/Nighthawk: parts of /gpfs/scratch/ became unavailable around 1700 but has now been recovered.
2002-12-03 at 21:00
Christmas holidays are approaching
The PDC helpdesk will not be staffed beginning Dec 21, 2002. During the holidays there will be no phone support and e-mail to the helpdesk will have increased response times. PDC staff will still watch the incoming e-mail and act upon incoming questions as time permits.
The helpdesk will be fully staffed again beginning with Jan 7, 2003
2002-11-28 at 18:00
KTH backbone and KTH border routers will be upgraded during approximately 3 hours. PDC has backup connection to SUNET, but bandwidth may be degraded during the time period.
2002-12-02 at 18:00 [xxx (SBC / CBR)]
Due to network upgrades the connectivity to the SBC cluster will be unreliable. The upgrade should take at most 4h (and probably less).
2002-11-14 at 12:03 [xxx (strindberg)]
Nighthawk (new sp) log-in and interactive nodes will be serviced and rebooted during 2002-11-20, probably round 1200.
2002-11-12 at 10:23 [xxx (strindberg)]
Strindberg; maintenance work for the old SP system to take place between 1200 and 1400.
2002-11-11 at 19:25 [xxx (strindberg)]
/gpfs/projects (on the old system) is unmounted during rebalancing.
2002-11-06 at 21:59 [xxx (SBC / CBR)]
The fileserver is now up and running again.
2002-11-06 at 18:42 [xxx (SBC / CBR)]
Unfortunatelly the AFS server restart might take longer than the scheduled hour due to unexpected hardware problems. We hope to have the sever up and running at least tomorrow morning. The job queue will be stopped until the server functions reliably.
2002-11-01 at 09:39 [xxx (strindberg)]
Strindberg: workaround activated, allocation is resumed.
2002-11-01 at 09:17 [xxx (strindberg)]
Strindberg: node allocation paused while solving problem with allocation of nodes.
2002-10-30 at 16:15 [xxx (SBC / CBR)]
20031106 18:00: AFS for SBC will have approximatelly 1h of downtime due to server restart.
2002-10-21 at 14:45 [xxx (SBC / CBR)]
The AFS-server should be OK now.
2002-10-21 at 11:12 [xxx (SBC / CBR)]
We are having some problem with gills again (AFS-server). Problem determination in progress.
2002-10-18 at 15:13 [xxx (SBC / CBR)]
The AFS-server should be OK now.
2002-10-18 at 09:03
We are currently experiencing some problems with one of our afs servers(gills) investigation in progress.
2002-10-14 at 10:02
Nighthawk; one node serving gpfs/scratch has been restarted.
2002-10-16 at 09:00
Network reconfiguration will take place during Wednesday, 2002-10-16. Short network interruptions may happen during the whole day and the batch queues will be stopped. You may not be able to access all of PDC's services.
2002-09-25 at 13:19
Nighthawk; /gpfs/kallsup filesystem fully operational again.
2002-10-05 at 16:42
Nighthawk - one node has a hardware fault. As this node also serves parts of the /gpfs/kallsup/ filesystem its availability is reduced until the hardware has been repaired.
2002-10-01 at 23:04 [xxx (strindberg)]
Intermittent problems with GPFS on the older SP system. Work is in progress, but jobs using GPFS could be affected by this. Please report any disturbances to your jobs to and we will credit your CAC.
2002-09-25 at 13:19 [xxx (strindberg)]
/gpfs/{scratch,projects} will be unmounted during installation of new disks.
2002-09-18 at 17:16 [xxx (strindberg)]
hardware maintenance completed.
2002-09-17 at 10:44 [xxx (strindberg)]
It migth be worth to point out that the gpfs of the Nighthawk SP system is not affected.
2002-09-17 at 07:59 [xxx (strindberg)]
/gpfs/ hardware replacements to take place tomorrow, 2002-09-18, starting at 1300. The availability of /gpfs/{projects,scratch} reduced until then.
2002-09-14 at 01:39 [xxx (strindberg)]
/gpfs/projects; one node serving /gpfs/projects has lost most of its disks. Fault search and possible recovery to take place during the weekend.
2002-09-13 at 18:20 [xxx (SBC / CBR)]
SBC: to avoid bringing down the one afs-server again we will schedule SBCjobs by hand - i.e. please submit long jobs and we will start them with an eye on the server.
2002-09-13 at 10:11 [xxx (SBC / CBR)]
SBC job queue halted again due to file server problems.
2002-09-13 at 10:05 [xxx (SBC / CBR)]
SBC batch lines resumed as afs-server salvage is complete.
2002-09-13 at 09:31
One afs-server is having problem and being salvaged.
2002-09-12 at 13:00 [xxx (selma)]
Periodic maintainance will be performed this afternoon Selma will be unavailable 13 - 17. Sorry for the short notice / haba.
2002-09-06 at 17:54 [xxx (HSM)]
The HSM system will continue to be unavailable over the weekend due to unforseen difficulties with the upgrade.
2002-09-06 at 11:54 [xxx (SBC / CBR)]
AFS file server problems, probably SBC related. SBC job queue stopped until cause known. No problem fix or workaround yet.
2002-09-06 at 10:02
File server down. All systems affected. Work in progress...
2002-08-26 at 13:39 [xxx (HSM)]
The HSM system will be unavailable during Wednesday 4/9 and Thursdag 5/9 due to a system upgrade.
2002-08-26 at 09:36 [xxx (linux lab)]
More problems on roxette, IBM will replace hardware today at 12.00
2002-08-22 at 10:57 [xxx (linux lab)]
The login node on the lxl cluster (roxette) has hardware problems, investigation in progress.
2002-08-21 at 10:00 [xxx (SBC / CBR)]
At 10:00 on Wednesday 21/8 the login nodes of the SBC cluster,, and will be taken down for software upgrades. The upgrade should take approximatelly 20 minutes.
2002-08-16 at 10:47 [xxx (SBC / CBR)]
The sbc-cluster login node will be restarted immediatelly due to yet undiagnosed failures.
2002-08-07 at 14:41
Due to urgent software upgrade, the AFS fileservers will be restarted. This should only take a short while for each server, if everything goes to plan.
2002-07-22 at 15:55 [xxx (strindberg)]
Nighthawk: fortran compiler fix installed, level changed from to Please report any related problems.
2002-07-13 at 16:28 [xxx (strindberg)]
Nighthawk: temporary replacement for gpfs/projects et scratch in place. Files as of 2002-07-08 at 12:00
2002-07-12 at 18:18 [xxx (strindberg)]
Nighthawk: /gpfs/projects (and scratch); hardware still faulty. Note: no data is gone, only the mechanism between the disk and the machine. As the time to repair seem to be unpredictable we have initiated a restore of all data from backup into another, equivalent, location.
2002-07-11 at 15:18 [xxx (strindberg)]
Nighthawk: further hardware service required for node serving gpfs/projects. next item will be replaced tomorrow, 2002-07-12, morning.
2002-07-08 at 16:42 [xxx (strindberg)]
Nighthawk - gpfs/projects and gpfs/scratch unavailable until hardware repair complete. gpfs/kallsup is still available. K (kallsup) nodes available for kallsup users, other nighthawk nodes for interactive use.
2002-07-08 at 16:42 [xxx (strindberg)]
nighthawk: node serving /gpfs/projects has a power supply failure. service representative has been invoked.
2002-07-08 at 16:42
afs: severe problems with one afs-server.
2002-07-08 at 16:42
afs: the availability of some afs-services has been shaky for the past couple of hours. We are keeping an eye on it.
2002-07-08 at 11:15 [xxx (SBC / CBR)]
Itchy, scratchy and krusty on the SBC-cluster will be taken down 10 minutes for maintainance at 10:00 15/7, 14:00 15/7 and 10:00 16/7 respectively.
2002-07-08 at 11:03
Informational for pdc-users with data in cell Maintenance work on the AFS servers at Nada will be done on 9 July starting at 6 pm. Most UNIX computers at Nada will be inoperable during this time. Other services, like E-mail and WWW servers, will also be affected.
2002-07-03 at 23:13 [xxx (strindberg)]
GPFS should be back online on the old SP system (Strindberg, pwr2 nodes).
2002-07-03 at 11:45 [xxx (strindberg)]
Problems with gpfs on old SP system (Strindberg, pwr2 nodes) Work is in progress to fix this.
2002-06-26 at 19:17
ssh is back online again
2002-06-26 at 10:09 [xxx (SBC / CBR)]
AFS on the SBC-cluster should be OK now.
2002-06-26 at 10:00
Due to a security problem ssh has been temporarily switched off.
2002-06-26 at 09:13 [xxx (SBC / CBR)]
Connection with one AFS file server has been lost from the SBC cluster. Work is in progress to fix this.
2002-06-14 at 16:25 [xxx (SBC / CBR)] is back.
2002-06-14 at 14:52 [xxx (SBC / CBR)]
Due to disk failure the login node will be taken down for service. The node will probably be back within 2h. In the mean time, please use or instead.
2002-06-09 at 10:17
TSM storage manager not responding. Impact : Odin and Genpat data not available.
2002-06-05 at 16:35 [xxx (strindberg)]
Nighthawk - new default versions of fortran, C and C++ compilers.
2002-05-29 at 17:39 [xxx (strindberg)]
Strindberg/Nighthawk - system work complete.
2002-05-29 at 11:50 [xxx (strindberg)]
We will do system work on strindberg and the nighthawk systems during the afternoon and early evening. Batch lines will be held during work.
2002-05-17 at 12:50 [xxx (SBC / CBR)] in service again.
2002-05-17 at 10:40 [xxx (SBC / CBR)]
Due to a malfunctioning disk the login node it will be taken down for service. It will probably be up again in the afternoon.
2002-05-06 at 17:58
PDC helpdesk closes Wednesday May 8 at 17:00 for the holiday. The helpdesk re-opens again Monday May 13 at 08:00.
2002-05-06 at 07:49 [xxx (strindberg)]
Scheduling problem found and resolved.
2002-05-06 at 07:49 [xxx (strindberg)]
Problwm investigation in progress.
2002-03-26 at 09:53
PDC helpdesk closes 12.00 on April 30. We reopen on May 2 at 08.00.
2002-04-27 at 19:16
One AFS server was doing weird things and was restarted.
2002-04-25 at 16:16 [xxx (SBC / CBR)]
The login node s02n01 aka krusty has been restarted due to problems with the AFS-client.
2002-04-18 at 17:36 [xxx (strindberg)]
Strindberg: Reboot of the login `strindberg' due to excess usage of system resources.
2002-04-16 at 19:56 [xxx (linux lab)]
lxl06 and lxl07 back in service. These nodes run a newer kernel and newer OpenAFS release as well as having perfctr/PAPI support.
2002-04-16 at 15:00
General: AFS, salvaging complete.
2002-04-16 at 15:00
General: AFS, salvaging one fileserver.
2002-04-15 at 17:38 [xxx (strindberg)]
Strindberg: gpfs/bins is unmounted during rearrangement of gpfs disks.
2002-04-10 at 16:28 [xxx (SBC / CBR)]
Scheduler running again.
2002-04-10 at 15:16 [xxx (SBC / CBR)]
Due to problems with the AFS-clients on some of the login-nodes on the SBC cluster, the scheduler has been stopped until the problem has been resolved. Please note that the jobs in the queue will be run when the scheduler is started again, i. e. you don't have to resubmit your jobs.
2002-04-02 at 14:37 [xxx (strindberg)]
Strindberg/Nighthawk: /gpfs/projects and /gpfs/scratch unavailable during disk-server reboot.
2002-03-26 at 09:53
PDC helpdesk is closed for Easter Holidays from 12:00, 2002-03-28. We reopen at 08:00, 2002-04-02.
2002-03-21 at 20:21
License server is back in service
2002-03-21 at 19:20
One license server is down
2002-03-11 at 13:39 [xxx (strindberg)]
Nighthawk - HA (high availability) subsystem causes gpfs to unmount.
2002-03-10 at 14:07 [xxx (SBC / CBR)]
Itchy (aka sbc-12) will be restarted Monday 11/3 at 10:00 due to AFS client problems.
2002-03-08 at 11:39 [xxx (selma)]
Due to excessive disk usage, we need to move some file systems around. Queues will be stopped during that time.
2002-03-13 at 14:00 [xxx (HSM)]
Some reconfiguration will occur on the HSM and backup servers. These services will be down for a few hours after 14.00 on wednesday 13.
2002-03-06 at 10:31 [xxx (strindberg)]
Strindberg - restore of gpfs/projects to the state at 2002-02-19 complete. Please contact pdc-staff if you prefer a restore to another date.
2002-03-04 at 18:46 [xxx (strindberg)]
Strindberg: scheduling paused while struggling with gpfs/projects.
2002-03-01 at 17:04 [xxx (strindberg)]
Strindberg: gpfs/projects. file system is being reformatted and a restore to the state at 2002-02-19 will follow.
2002-03-01 at 13:11 [xxx (strindberg)]
Strindberg: gpfs/projects - will be unmounted to perform a fsck (file consistency check.)
2002-02-27 at 15:45 [xxx (strindberg)]
Strindberg: /gpfs/scratch has been reinitialized and reformatted.
2002-02-27 at 14:25 [xxx (strindberg)]
Strindberg: gpfs will be restarted during the service window. /gpfs/scratch/ will eventually be reinitialized.
2002-02-27 at 11:29 [xxx (strindberg)]
The two K-nodes and some of the N-nodes on Strindberg will be taken into service on Thursday (2002-02-28). They will be used as webservers for Vasaloppet. They will be back in production early next week. During this period we supply access to a 32 processor IBM pwr4 (141 GFlop/s peak, 32 GByte) node at CSC. The affected users have been or will be contacted.
2002-02-26 at 13:41 [xxx (SBC / CBR)]
SBC-17 (Scratcy) had a load of ~70 due to lots of hanging processes, so it was rebooted.
2002-02-26 at 13:20 [xxx (SBC / CBR)]
itchy and sbc-m1 problems. Investigation in progress.
2002-02-04 at 18:25 [xxx (strindberg)]
Network problems with Strindberg.
2002-02-01 at 18:22
General: Fileserver problem. Salvage in progress.
2002-02-06 at 14:00 [xxx (HSM)]
The HSM software will be upgraded and the HSM service will therefore be unavailable during the afternoon on wednesday starting at 14.00 for up to 5 hours.
2002-02-01 at 09:43 [xxx (SBC / CBR)]
Due to cooling problems (the water we get is too warm), some SBC-nodes will be shutdown.
2002-01-30 at 16:39
General [All systems]: Recovering faulty afs-server; schedulers are paused (no new jobs will be started.)
2002-01-30 at 14:52
General [afs]: lost one afs-fileserver.
2002-01-23 at 21:43 [xxx (strindberg)]
nighthawk/kallsup2/nf01n01: Some jobs have had job start problems post the network reorganisation, as a consequence of using a new network route to obtain more tickets for a job. (this was out of user control.)
2002-01-23 at 21:43 [xxx (strindberg)]
r03n05: users with jobs running with r03n05 as one of the nodes have experienced job start problems. this has been solved.
2002-01-21 at 10:21 [xxx (strindberg)]
Switch problems on Strindberg causes a complete production stop on that system currently. Problem determination in progress.
2002-01-20 at 22:52 [xxx (strindberg)]
Nighthawk: gpfs on the nighthawk systems has been restarted.
2002-01-18 at 15:45
Network and file server problems. Investigation in progress. Duration difficult to predict, best guess is up again some time tonight. Don't hold your breath.
2002-01-23 at 13:00
In order to upgrade the PDC network backbone, central switches will be replaced and reconfigured. This will yield in interruptions for all interactive use of PDC's computers like logins and file access. Scheduled downtime is 5 hours.
2002-01-15 at 13:19
We are currently experiencing some network problems with Strindberg. Problem determination in progress.
2002-01-14 at 13:24 [xxx (SBC / CBR)]
The login node sbc-11 aka itchy will be rebooted at 14:20 due to AFS-cacheproblems.
2002-01-09 at 09:48 [xxx (SBC / CBR)]
sbc-m1 is retired as scheduler, thus the scheduling on the SBC cluster not affected.
2002-01-09 at 08:50 [xxx (SBC / CBR)] is down. This affects all scheduling on the SBC cluster. sbc-m1 will be rebooted later today.
2002-01-02 at 18:38
File system problem resolved.
2002-01-02 at 17:57
Some sort of problems with file system. Investigation in progress.
2002-01-02 at 15:46 [xxx (linux lab)]
Linix lab cluster down because of room cooling failure. Compressor failure. Estimated fix around 2002-01-07.
