SAN Monitor Procedures

Quick link to section:



Generic Services
PING     Description:
Critical message. Host/Ctlr is not responding to ping request. It may be failed, down or non-operational.

Action:
Contact on-call person.



Disk Services
Failed_Drive     Description:
Non-critical message. A drive in a RAID5 disk set has failed, it will automatically be replaced by a hot spare.

Action:
If necessary, silence any auditory alarms.
Disk_Ctlr_Failed     Description:
Critical message. A disk controller has been failed, I/O may be restricted or unavailable.

Action:
Contact on-call person.
Disk_Ctlr_State_Change     Description:
Critical message. A disk controller is no longer operating in optimal mode.

Action:
Contact on-call person.
Disk_Volume_Degraded     Description:
Warning message. A lun or raid5 volume is in a failover state (no hot spares available).

Action:
If necessary, silence any auditory alarms.



Switch Services
Switch_Ctlr_Failed     Description:
Warning message. An alternate or failover CP has failed or rebooted. High Availability code will keep I/O going.

Action:
Contact on-call person.
Switch_Kernel_Panic     Description:
Critical message. Switch or controller has rebooted. May affect I/O.

Action:
Contact on-call person.
Switch_Port_Failed     Description:
Critical message. Hardware problem on a specific port. I/O will be unavailable for the attached devices.

Action:
Contact on-call person.
Switch_Trunk_Failed     Description:
Critical message. An inter switch link has failed. I/O bandwidth will be reduced or unavailable. Fabric may be segmented.

Action:
Contact on-call person.



Errata / Notes

If a service problem name is not defined above, then you don't need to worry about it. You can feel free to acknowledge it to signify to other TMG shifts that the problem has been noticed and requires no action.