Skip to main content
Skip to main content

Outage Procedures

Audience: A&E staff, service owners, and on-call engineers

Purpose: Guide for posting IT Alerts and coordinating outage communications


Quick Reference

Contact MethodDetails
Emailoutage@tamu.edu
TeamsIT Operations Center team
PhoneContact IOC dispatcher
Important

Always coordinate with the IT Operations Center (IOC) for service outages. Do not post IT Alerts independently—use the official process below.


IT Alert Process

Overview

IT Alerts notify the campus community about service disruptions, scheduled maintenance, and restored services. The IT Operations Center (IOC) manages IT Alert postings on behalf of service teams.


Submitting an IT Alert Request

Step 1: Gather Required Information

Before contacting the IOC, collect the following details:

FieldDescriptionExample
Service NameOfficial service or system name"AggieCloud", "Exchange Online"
Impact DescriptionWhat users are experiencing"Users unable to access email via Outlook"
Affected PopulationWho is impacted"All faculty and staff", "Engineering college"
Start TimeWhen the issue began"2024-01-15 14:30 CST"
Estimated ResolutionExpected fix time (if known)"Within 2 hours" or "Unknown"
WorkaroundAlternative access method (if available)"Use Outlook Web Access at outlook.office.com"
Contact PersonEngineer handling the issue"John Smith, x5-1234"

Step 2: Contact the IOC

Choose the appropriate contact method based on urgency:

Email (Standard Priority)

Send an email to outage@tamu.edu with:

Subject: [IT ALERT REQUEST] Service Name - Brief Description

Body:

Service: [Service Name]
Impact: [Description of user impact]
Affected: [Population affected]
Start Time: [When issue began]
Est. Resolution: [Expected fix time]
Workaround: [Alternative if available]
Contact: [Your name and phone]

Additional Details:
[Any other relevant information]
Microsoft Teams (Urgent)

For faster response on urgent issues:

  1. Navigate to the IT Operations Center team in Microsoft Teams
  2. Post in the appropriate channel with:
    • @mention the IOC dispatcher if critical
    • Include all required information
    • Indicate urgency level
Phone (Critical/P1)

For critical outages affecting large populations:

  1. Call the IOC dispatcher directly
  2. Provide verbal summary of the outage
  3. Follow up with email documentation
After-Hours

For after-hours emergencies, use the on-call escalation process through the IOC.

Step 3: Monitor and Update

During the outage:

  1. Provide Status Updates — Send progress updates to the IOC every 30-60 minutes
  2. Notify of Changes — Alert IOC if scope, impact, or timeline changes
  3. Report Resolution — Immediately notify IOC when service is restored

IT Alert Types

Alert TypeDescriptionWhen to Use
OutageUnplanned service disruptionService is down or degraded
MaintenanceScheduled service windowPlanned maintenance with expected impact
SecuritySecurity-related notificationPhishing, compromise, or security incident
UpdateStatus update on existing alertProgress report or scope change
ResolvedService restoration noticeIssue has been fixed

Escalation Procedures

Severity Levels

LevelDefinitionResponse TimeEscalation
P1 - CriticalMajor service down, large populationImmediateDirector/CIO notification
P2 - HighSignificant degradation, medium impact15 minutesTeam lead notification
P3 - MediumPartial impact, workaround available30 minutesStandard process
P4 - LowMinor issue, limited impact1 hourStandard process

Escalation Path


Post-Incident Activities

After resolution:

  1. Confirm Resolution — Verify service is fully restored
  2. Request Resolution Alert — Ask IOC to post "Resolved" update
  3. Document Timeline — Record incident details for post-mortem
  4. Root Cause Analysis — Complete RCA for P1/P2 incidents
  5. Lessons Learned — Share findings with team

Post-Mortem Template

For significant outages, complete a post-mortem including:

  • Incident timeline
  • Root cause analysis
  • Impact assessment (duration, users affected)
  • What went well
  • What could be improved
  • Action items with owners and due dates