Events:

2016-12-20 at 15:41 [beskow]
planned preventive maintenance finished. System on-line again.
2016-12-18 at 10:45 [beskow]
one cabinet (c1-1) did experience a dip in power distribution early this morning, 70+ nodes shut down, and jobs on those certainly stopped. Other jobs might have taken impact as well during high speed network recovery.
2016-12-13 at 11:16 [beskow]
planned preventive maintenance to take place Tuesday 2016-12-20. The system will go off-line around 09:00 in the morning.
2016-11-20 at 21:07 [beskow]
Some pieces disabled for the time being. Investigation far from conclusive, but the system is running jobs again. Expect further stops to address the issue.
2016-11-20 at 18:39 [beskow]
the system is being brought down to investigate strange interconnect behaviour. It might be kept without restored user access until tomorrow without further notice.
2016-11-19 at 19:58 [beskow]
Since earlier today there are problems/sluggishness on accessing the klemming file-system. It is as of writing unclear whether that is related to the internal beskow network or between beskow and klemming. No further jobs will be allowed to start to run.
2016-10-26 at 15:08 [beskow]
Kernel patches applied, system is running jobs since a while, and user logins are allowed again.
2016-10-25 at 13:09
We are currently experiencing some network problems internally at PDC affecting some services/systems. Investigation is in process.
2016-10-24 at 22:01 [milner]
during audit of CVE-2016-5195 further logins are prohibited. Existing sessions as well as batch jobs will work as usual. It is as of writing unclear whether this will take in the order of a day, or in the order of several days. b
2016-10-24 at 22:00 [beskow]
during audit of CVE-2016-5195 further logins are prohibited. Existing sessions as well as batch jobs will work as usual. It is as of writing unclear whether this will take in the order of a day, or in the order of several days.
2016-10-20 at 10:07 [tegner]
Due to some (yet) unknown (probably software) problem, tegner-login-1 was unreachable between 10:07:49 and 10:51:15.
2016-10-11 at 10:36
SUNET seems to experience major connectivity problems to other parts of the Internet. http://www.nunoc.org/nunocweb/ticket.php?key=SUNETTICKET-3893
2016-09-06 at 13:55 [klemming]
/cfs/klemming was blocked, frozen, a while ago for several minutes during unexpected fail-over.
2016-08-18 at 07:14 [beskow]
Planned power outage is completed and Beskow is running jobs again.
2016-08-16 at 11:51 [beskow]
Planned power outage KTH Campus 2016-08-18 01:30-04:00. Beskow will be shutdown over the duration of the power outage.
2016-05-26 at 09:40
The scheduler node of Tegner has been restarted due to a software issue. Until it is back up it will not be possible to submit or view jobs. Running and queued jobs should be unaffected however.
2016-05-04 at 13:02
Tegner - the primary login node on Tegner crashed and has now been restarted. Sorry for any inconvenience.
2016-04-22 at 11:05 [klemming]
Due to jobs over loading the meta data server of Klemming, there was a short, around 5 mins, stop in access to the file system while it failed over to the secondary server. Apart from the pause, it should not effect running jobs. If you still notice problems let us know.
2016-03-23 at 18:56 [milner]
the milner login node ran out of system resources and has been restarted.
2016-03-21 at 16:40 [beskow]
the maintenance stop is complete and a cabinet power supply rack has been replaced. Jobs are running again.
2016-03-15 at 15:40 [klemming]
On Monday, the 21st, we will do some reconfigurations of the Klemming file system to enable new features. At that time Beskow will be down for hardware maintenance, as previously announced, and no jobs will be allowed on Tegner. The file system is expected to be back some time in the afternoon.
2016-03-15 at 13:29 [beskow]
a second maintenance stop is scheduled for coming Monday, 2016-03-21 starting at 07:30. We will work on, and eventually replace, a cabinet power supply rack.
2016-03-07 at 09:05 [beskow]
there was a slurm (batch system) hick-up on beskow early this morning. Slurm has been restarted. The reason is as of now unknown.
2016-02-25 at 10:17
Due to a security alert the license servers at PDC (primarily license-1.pdc.kth.se and tetra.pdc.kth.se) will be patched today. All programs that use licnses checked out form these servers will likely have intermittant problems today.
2016-02-22 at 10:58 [beskow]
the login node will be restarted during lunch today (12:00) to free up blocked/locked node resources.
2016-02-17 at 22:14
Tegner is now available again - unfortunately all running jobs were interrupted, our apologies for any inconveniences this has caused.
2016-02-17 at 11:02
Tegner is currently unavailable due to unexpected side-effects of a critical software update. Updates will follow.
2016-02-11 at 20:29 [beskow]
the first maintenance stop is completed and the system runs jobs again, now using all cabinets. Several pieces of power/rectifier hardware have been ruled out as being the cause of the unexpected stop last Sunday.
2016-02-10 at 17:44 [milner]
the slurm batch scheduler was in a stopped state. It has been restarted, but no actual search for the reason has yet been made.
2016-02-09 at 14:30 [beskow]
A first maintenance stop is scheduled to start on Thursday afternoon, 2016-02-11 at 15:00. The primary objective of this stop is to identify the parts where the power/rectifier distribution is not working optimal.
2016-02-07 at 22:43 [beskow]
8 out of 9 cabinets in the system are running user jobs again. As a few important pieces of system software execute in the cabinet with unclear power/rectifier distribution functionality, we will very likely have to arrange with one or several system maintenance stops in the near future.
2016-02-07 at 11:25 [beskow]
Large parts of the system unavailable. Investigation on whether it's related to power outage, network related, or other cause has just started. Do not expect the system to be back on-line within short.
2016-01-29 at 11:55 [klemming]
On Wednesday next week, the 3rd of February, at 10:00, we will apply an update to the metadata servers of the Klemming file system to improve stability. While this should only be noticable as a pause in metadata operations of around a few minutes, to reduce the risk of bad side-effects, no jobs will start on Beskow/Tegner during the operation.
All flash news for 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995

Back to PDC
Subscribe to rss