Break reports

Here are listed all breaks in HIIT's IT services.

Break in file services at 2010-10-20 14:30 - 16:25

Description: 

Schedule:

2010-10-20 14:30 - 16:25

Duration:

1:55 h

Affected services:

File services and some www-sites.

Reason:

Problems with a disk array system.

Update at 16:00: Web- services are partially functional. All sites that rely on file service (fs) are still unavailable.

Update at 16:25: Disk array system's functionaly has been restored and file services and all web sites are up and running. We're currently investigating the cause of the problem with hardware manufacturer to prevent this from happening again. The break is over.

Break in Radius authentication at 2010-10-18 11:06-11:32

Description: 

Schedule:

2010-10-18 11:06 - 11:32

Duration:

0:26 h

Affected services:

eduroam

Reason:

Radius- proxy's certificate expired and thus stopped Radius authentication that's used in e.g. eduroam.

Update at 11:33: New certificate was installed. The break is over.

Breaks in few servers at 2010-10-08 20:50-21:17

Description: 

Schedule:

2010-10-08 20:50 - 21:17

Duration:

0:22 h

Affected services:

VCS, wiki, WWW, file services.

Reason:

The following servers will be rebooted due to SAN configuration changes.
- finglas.it.hiit.fi
- frodo.it.hiit.fi
- bilbo.it.hiit.fi
- universe.hiit.fi
- terae.hiit.fi

We changed SAN zoning from hard (port based) to soft (WWN based). Fabric 2 change went fine, no problems from hosts. When fabric 1 was changed one path to one of our disk arrays never recovered and one host started to lose connection to that disk array. To avoid further problems affected servers were rebooted.

Breaks to individual service wasn't longer than five minutes.

Break in file services (fs) at 2010-09-26 11:00 - 11:15

Description: 

Schedule:

2010-09-11 11:00 - 11:15

Duration:

0:15 h

Affected services:

File service's (frodo, fs) group directories and some www-sites.

Reason:

Configuration changes in LUNs from SAN attached disk array systems.

The following web-sites were also shortly (few minutes) affected due to their dependency on file service:

  • www.futureinternet.fi
  • betelgeuse.hiit.fi
  • cgi.hiit.fi
  • cosco.hiit.fi
  • packages.hiit.fi
  • pgm2010.hiit.fi
  • www.mdl-research.org

Breaks in several servers at 2010-09-18 08:00-08:27

Description: 

Schedule:

2010-09-18 08:00 - 08:27

Duration:

0:27 h

Affected services:

DHCP, file services, VCS, VPN, wiki, WWW.

Reason:

The following servers will be rebooted due to kernel upgrade.
- finglas.it.hiit.fi
- frodo.it.hiit.fi
- openvpn01.fe.hiit.fi
- bilbo.it.hiit.fi
- eowyn.it.hiit.fi
- peregrin.it.hiit.fi
- pippin.it.hiit.fi
- stat.fe.hiit.fi
- universe.hiit.fi
- terae.hiit.fi

Pending updates will be installed as well.

Breaks to individual service should not be longer than five minutes.

Update at 08:27: All servers have been updated. The break is over.

Pages