1999-12-20 at 16:20
License server : The PDC license server was rebooted after patching. It should not have affected any running programs.
1999-12-15 at 16:35 [xxx (strindberg)]
Strindberg: Switch fault recovered, resuming batch.
1999-12-14 at 16:20 [xxx (strindberg)]
Strindberg: Switch fault.
1999-12-14 at 12:20
Selma: Major failure on the system disk caused us a lost day, but no lost user data. Please let us know otherwise.
1999-12-10 at 15:14 [xxx (strindberg)]
Strindberg: Switch/GPFS back.
1999-12-10 at 13:55 [xxx (strindberg)]
Strindberg: Switch/GPFS problem.
1999-11-30 at 15:46 [xxx (strindberg)]
Strindberg : job allocation restarted. GPFS was gone on a number of nodes. Now resolved.
1999-11-30 at 13:27 [xxx (strindberg)]
Strindberg : job allocation stopped. Investigating possible problems.
1999-11-26 at 09:00
Kallsup : The machine will be brought down to change a broken disk at 14.00 on Monday 29/11.
1999-11-24 at 13:00 [xxx (strindberg)]
Strindberg: /gpfs/projects is unavailable during reconfiguration.
1999-11-24 at 10:00
There were intermittent nameserver problems during the night.
1999-11-23 at 23:00
Things should now be back to normal.
1999-11-23 at 22:00
We're having some namesever trouble.
1999-11-10 at 15:00
The PDC license server will be brought down at 16.00 to change a faulty memory module.
1999-11-10 at 14:00
All PDC Systems will be unavailable from 1999-11-13 09:00 to 1999-11-14 due to power maintenance work (UPS).
1999-11-10 at 13:00
Kallsup&HSM: HSM is know working normally.
1999-11-09 at 11:00
Kallsup&HSM: HSM unreachable, service in progress
1999-11-02 at 01:00
license server about to be restarted.
1999-11-01 at 20:00
All production resumed.
1999-11-01 at 15:00
Further problems with the UPS caused a major power outage. All production was stopped. Problem hopefully (again) circumvented for now.
1999-11-01 at 02:00
All production is resumed.
1999-11-01 at 01:00 [xxx (strindberg)]
All systems currently up except GPFS on Strindberg/August on which we do file system consistency checks, fsck. These expected to complete in quite some time, allocation is paused during fsck. We expect to run further tests on most systems later today, 1999-11-01, during daytime. The UPS is bypassed for the time being.
1999-10-31 at 20:00
We hopefully have power now. Don't expect anything to work for a while, though. The problem might be because of a faulty UPS.
1999-10-31 at 17:45
We're currently having a major power outage. To be continued...
1999-10-28 at 15:00
HSM: currently disabled because of hardware trouble. We're waiting for a service technician.
1999-10-22 at 12:00
AFS: stability problems with 1 vldb (volume location database) server.
1999-10-18 at 14:00 [xxx (strindberg)]
Strindberg: Unfence operation causes gpfs-unmount.
1999-10-10 at 22:30
Kallsup&HSM: probable cause - LD cache error caused panic.
1999-10-10 at 20:40
Kallsup&HSM: Kallsup unreachable, fault search to begin.
1999-10-05 at 21:00 [xxx (strindberg)]
Strindberg: During the Wednesday service window enabled tomorrow, 1999-10-06, users might not be able to access any GPFS files.
1999-09-21 at 10:00
ADSM/HSM/DSM: tape robot offline due to maintenance/expansion.
1999-09-20 at 09:00 [xxx (strindberg)]
Strindberg: possible filesystem hang on the log in.
1999-09-13 at 17:15 [xxx (strindberg)]
Strindberg: Switch adapter replaced.
1999-09-09 at 20:00 [xxx (strindberg)]
Strindberg: global switch fault.
1999-09-09 at 19:30
Sprat : Back online after the OS upgrade.
1999-09-09 at 15:00 [xxx (strindberg)]
Strindberg: Strindberg: switch adapter fault probably caused problems for a few running jobs.
1999-09-09 at 09:00
Sprat : The machine will be brought off-line for an OS upgrade during the day. It should be back up by 12.00 on friday (10/9).
1999-09-07 at 14:00 [xxx (strindberg)]
Strindberg: switch adapter fault probably caused problems for a few running jobs.
1999-09-02 at 11:20
Selma: Selma is having trouble.
1999-08-25 at 12:30
Mail: the PDC mail server has disk problems.
1999-08-23 at 11:04 [xxx (strindberg)]
Strindberg: Recovery completed.
1999-08-23 at 08:30 [xxx (strindberg)]
Strindberg: Job scheduling paused. Control Workstation / switch problems. Recovery in progress.
1999-08-18 at 11:00 [xxx (strindberg)]
Strindberg: Job scheduling resumed.
1999-08-18 at 08:50 [xxx (strindberg)]
Strindberg: Job scheduling stoped due to unstable fileserver.
1999-08-17 at 18:10 [xxx (strindberg)]
Strindberg: Job scheduling resumed.
1999-08-17 at 16:20 [xxx (strindberg)]
Strindberg: Job scheduling stoped due to fileserver fault. Some users and programs (g98) affected.
1999-08-10 at 10:00 [xxx (strindberg)]
Strindberg: we are about to resume batch runs within short.
1999-08-10 at 09:00 [xxx (strindberg)]
Strindberg: global switch fault. All running jobs affected.
1999-08-03 at 17:00 [xxx (strindberg)]
Strindberg: The log in node strindberg will be rebooted prior 18:00, 1999-08-03.
1999-08-03 at 15:00 [xxx (strindberg)]
Strindberg: Log in node strindberg is stuck, fault search in progress. Please use the log in august in case you are not CPU dependent.
1999-07-03 at 23:00 [xxx (strindberg)]
Strindberg: Problems with the LoadLeveler caused the scheduling system to be down most of the night.
1999-07-02 at 10:08 [xxx (strindberg)]
Strindberg: Gaussian98 defaults changed through creating a Default.Route file of -M- 140MB and -#- MaxDisk=2048MB
1999-06-29 at 15:40
General: There seem to be a routing problem at KTH that may cause users to have problem locating DNS servers and/or hosts in the KTH domain. The problem is outside the control of PDC but is worked on. If you have experienced problems they are hopefully resolved by now.
1999-06-17 at 09:30 [xxx (strindberg)]
Strindberg: unfence operation might have caused switch related problems for running jobs.
1999-06-08 at 09:50
Kallsup&HSM: Kallsup is back online, however due to security problems rxtelnet does not work for now, regular kerberized telnet should work fine though.
1999-06-08 at 09:20
Kallsup&HSM: Machine hung. Dump and reboot in progress.
1999-06-04 at 12:00
Hardware maintenance on Boye
1999-06-02 at 12:30
License server problems resolved.
1999-06-02 at 12:00
Hardware problems with license server.
1999-05-29 at 10:00
Kallsup: full usr/spool filesystem might have caused problems for some nqs jobs.
1999-05-27 at 16:00 [xxx (strindberg)]
Strindberg: Login node problems. The login node will be available again within short.
1999-05-21 at 10:00
Boye: Hardware problems, some parts will be replaced during the day.
1999-05-17 at 12:00
GPFS: rebalancing the filesystem. The rebalance will reduce filesystem performance during the next two hours.
1999-05-10 at 14:30
Mail: maintenance service on one PDC mail-server.
1999-05-05 at 22:30 [xxx (strindberg)]
Strindberg: running with latest software.
1999-05-05 at 13:30 [xxx (strindberg)]
Strindberg: software (PTFs) upgrade started.
1999-04-21 at 19:25
Kallsup&HSM: Reboot due to problems with the tape subsystem.
1999-04-20 at 09:45 [xxx (strindberg)]
Strindberg: systems is back up.
1999-04-20 at 08:30 [xxx (strindberg)]
Strindberg: several systems down.
1999-04-07 at 21:20 [xxx (strindberg)]
Strindberg: the log in `strindberg' is repeatadly panic'ing. The strindberg log in is once again moved back to the G-node. It might take a while to move all submitted jobs, until then node allocation is paused.
1999-04-06 at 22:30 [xxx (strindberg)]
Strindberg: node allocation resumed.
1999-03-28 at 12:00
General preventive: If you have switched to daylight savings time period and have problems to authenticate, please find the information in `guided tours' about proper time.
1999-03-25 at 17:00 [xxx (strindberg)]
Strindberg/gpfs: the gpfs-filesystem will be restarted.
1999-03-24 at 13:30 [xxx (strindberg)]
Strindberg: todays hardware maintenance complete.
1999-03-24 at 10:00
Kallsup: Kallsup will be brought down for hardware maintenance 08.00 on the 24/3.
1999-03-23 at 17:00 [xxx (strindberg)]
Strindberg: hardware maintenance on the SMP log-in (august), the batch-system node, and several batch-nodes. Starting 1999-03-24 at 10:00.
1999-03-22 at 13:00 [xxx (strindberg)]
All systems/AFS: Problems with one AFS server have made some directories unvisible from some file system clients. The problem should be fixed around 18:00 today. Try to use another computer (example: strindberg) to access your files during that period.
1999-03-22 at 12:00 [xxx (strindberg)]
Strindberg/gpfs: gpfs salvage complete. All files, except those written with input/output error status during adapter fault, should be correct.
1999-03-20 at 22:30 [xxx (strindberg)]
Strindberg/gpfs: one broken disk/adapter have probably caused loss of files created prior late friday night, 1999-03-19 or possibly during saturday 1999-03-20.
1999-03-19 at 00:00 [xxx (strindberg)]
Strindberg: batch lines let loose until next maintenance window, at 17:00 later today.
1999-03-18 at 13:00 [xxx (strindberg)]
Strindberg: gpfs hang on one of the log in nodes. We have allocated a minimum of one hour maintenance window each weekday starting at 17:00 hours, for the time being.
1999-03-17 at 15:30 [xxx (strindberg)]
Strindberg: gpfs hang on the log in node. The log in node will be moved. Please relogin to after the reboot.
1999-03-17 at 10:00
Selma: OS upgrade in progress. See news page.
1999-03-16 at 21:00 [xxx (strindberg)]
Strindberg: reformat of /gpfs/scratch filesystem is imminent.
1999-03-16 at 15:30 [xxx (strindberg)]
Strindberg: has been running since approximately 14:00.
1999-03-16 at 13:30 [xxx (strindberg)]
Strindberg: You can log in but we are applying a few last minute changes so batch is currently on hold. See Getting Restarted after the Upgrade
1999-03-13 at 17:30
Kallsup: Back up after dump/reboot.
1999-03-13 at 16:30
Kallsup: Uncorrectable error in kernel; dump, analysis and restart in progress.
1999-03-11 at 15:00 [xxx (strindberg)]
Strindberg: We estimate that the upgrade will be complete and Strindberg back in production Tuesday, 1999-03-16 at 13:00. As of now only test-users have permission to run.
1999-03-10 at 21:00 [xxx (strindberg)]
Strindberg: Further information about upgrade process will follow tomorrow, 1999-03-11.
1999-03-10 at 14:00
Kallsup&HSM : The machine is now open for users again.
1999-03-06 at 18:00
Kallsup&HSM : Due to the delayed delivery of the replacement for the failed disk drive, Kallsup and the HSM will not be back up until thuesday 9/3 at the earliest.
1999-03-06 at 09:00 [xxx (strindberg)]
Strindberg: system software and hardware upgrade in progress. Please monitor this news page for further information as the upgrade proceedes.
1999-03-05 at 09:00
Kallsup&HSM : The current status of Kallsup and the HSM is that we have had one disk-failure and problems installing a new version of DMF. We are currently awaiting delivery of a replacement disk and a new copy of DMF from Cray.
1999-03-01 at 09:00
Kallsup&HSM : Kallsup is undergoing an OS upgrade, both kallsup and the HSM will be unavailable during this period and will be back up Monday 8/3.
1999-02-24 at 12:00 [xxx (strindberg)]
Strindberg: Switch problems. Scheduler stopped. Await running jobs the switch will be restated at appr. 16:00 and scheduling resumed.
1999-02-18 at 22:00
Kallsup&HSM: Hung, reset and restarted. Cause yet unknown.
1999-02-18 at 20:48
Kallsup&HSM: Problems, investigation in progress.
1999-02-08 at 22:55 [xxx (strindberg)]
Fileserver back on-line and EASY allocation on strindberg resumed.
1999-02-08 at 22:02 [xxx (strindberg)]
EASY on strindberg: allocation of new jobs paused.
1999-02-08 at 21:52
AFS: one fileserver is not responding. Investigation in progress.
1999-02-06 at 14:12 [xxx (strindberg)]
Strindberg: Allocation resumed.
1999-02-06 at 13:00 [xxx (strindberg)]
Strindberg: PIOFS down due to checkstop (hardware fault) in one server node. Running jobs using piofs affected.
1999-02-06 at 12:30 [xxx (strindberg)]
Strindberg: PIOFS problems, the queues will be stopped.
1999-02-05 at 23:00
Kallsup/HSM: Recovery of the failed disks took long time but should be resolved now. The machine is up and the queues have been restarted.
1999-02-05 at 12:00
Kallsup/HSM: The change of power supply has been delayed a few hours.
1999-02-04 at 15:00
Kallsup/HSM: The power supply will be changed tomorrow morning. Hopefully the machine will be back up by 12:00 am.
1999-02-04 at 14:00
Kallsup/HSM: Disk failures probably caused by a broken power supply.
1999-02-04 at 09:00
Kallsup/HSM: Kallsup shut down while investigating disk failures.
1999-01-27 at 11:00
Network: One broken router card caused network dropout. We now route affected networks through a backup router.
1999-01-24 at 19:30 [xxx (strindberg)]
Strindberg: Job allocation resumed.
1999-01-24 at 12:00 [xxx (strindberg)]
Strindberg: SDR and job-manager out of sync. Job allocation held until all running jobs have completed.
1999-01-15 at 10:00
Network: there were a network dropout a few minutes ago.
1999-01-05 at 15:30
Licenses: one license server broken. You might experience problems executing certain licensed software.
