Alerts

Purpose

Alerts provides system-generated risk signals for infrastructure health, capacity, and operational failures.

When to Use

Use this page when you need to identify current risk, prioritize response, and verify whether issues recur.

Step: Start Alerts Workflow

When to Use:

Use this at the start of active-risk triage.

Purpose:

Prioritize actionable alerts and drive a consistent response sequence.

Steps:

  1. Open Control Center -> Observability -> Alerts.

  2. Filter by type/search and prioritize repeated or service-impacting failures.

  3. Map each active alert to a triage action and assign owner/timestamp.

  4. Re-check alert state after remediation changes.

Expected Outcome:

  • You can prioritize active risks and execute a repeatable first-response workflow.

If this fails:

  1. Clear filters and reload newest alerts.

  2. Continue with detailed steps in the included guide.

When to Use Alerts

Use Alerts when you need system-generated notifications for infrastructure events, thresholds, and operational issues.

Step: Review Alerts

When to Use:

Use this first when triaging active platform risk conditions.

Purpose:

Identify the highest-impact alerts and establish a response order.

Steps:

  1. Open Control Center -> Observability -> Alerts.

  2. Sort by Sent to start with most recent alerts.

  3. Prioritize critical alert categories first (for example Host Down and VM Error).

  4. Use filters and search to isolate related alerts before deep triage.

Alerts dashboard

Alerts dashboard.

Tip

Sort by Sent first to assess current incident impact before reviewing older alerts.

Expected Outcome:

  • High-risk alerts are identified and ranked.

  • A clear action path exists for the highest-priority alert categories.

If this fails:

  1. Clear filters and retry from newest rows.

  2. Re-apply one filter at a time.

Step: Open Alerts Help

When to Use:

Use this when alert category meaning or operator action is unclear.

Purpose:

Confirm alert semantics and triage guidance before remediation decisions.

Steps:

  1. On the Alerts page, click the help icon.

  2. Review alert category guidance and triage context.

  3. Return to the table and continue with type/search filters.

Alerts dashboard help panel

Alerts help panel.

Expected Outcome:

  • You can map alert categories to appropriate response actions.

If this fails:

  1. Refresh and reopen help.

  2. Continue with alert type definitions from this guide.

What the Alerts Table Shows

Column

Description

Name

Short identifier/name of the alert.

Type

Alert category (for example Capacity, Memory, CPU, Storage, Host Down).

Description

Detailed trigger/context message for the alert.

Subject

Alert message summary.

Sent

Timestamp when the alert was generated/sent.

Alert Type Reference

Type

Description

Capacity

Overall resource capacity thresholds reached.

Memory

Physical/virtual memory usage exceeded limits.

CPU

Processor utilization is critically high.

Storage

Storage pool usage exceeded thresholds.

Storage Allocated

Allocated storage space is nearing limits.

Public IP

Public IP pool is nearing exhaustion.

Private IP

Private IP range is running low.

Host

Host node reported an issue.

VM Error

Virtual machine encountered an error.

Domain Router

Domain virtual router has an issue.

Console Proxy

Console proxy service is not responding.

Routing

Network routing issue detected.

Storage Misc

Miscellaneous storage subsystem issue.

Usage Server

Usage data collection server issue.

Management Node

Management server reported a problem.

Host Down

Host node is unresponsive/offline.

VM Migration

VM live migration encountered an error.

Secondary Storage

Image Storage reported an issue. This alert type still uses the legacy backend name Secondary Storage.

Usage Server Result

Usage server returned unexpected results.

Storage Delete

Storage deletion operation error.

Update Resource Count

Resource count update failed.

Usage Sanity Result

Usage sanity check found inconsistencies.

Direct Attached Public IP

Direct attached public IP issue detected.

Local Storage

Local (non-shared) storage issue.

Resource Limit

Account/domain resource limits are being approached.

Sync

Synchronization between components failed.

Upload Failed

Boot image/template upload operation failed.

VM Snapshot

VM snapshot operation failed.

Step: Filter and Search Alerts

When to Use:

Use this when alert volume is high and triage requires focused filtering.

Purpose:

Reduce alert noise to the affected service/resource and incident window.

Steps:

  1. Use All Types to isolate a specific alert category.

  2. Use search to filter by alert name/content (host, VM, storage pool, or other keywords).

  3. Use Sent ordering to focus on the newest alerts first.

  4. Narrow to a time window when investigating an incident.

Expected Outcome:

  • Filtered results contain only actionable alerts for the current incident path.

If this fails:

  1. Keep only one Type filter active.

  2. Add one keyword for the affected resource.

  3. Reduce to the exact incident window.

Step: Prioritize and Recheck Alerts

When to Use:

Use this after filtering alerts to decide response order and verify recurrence.

Purpose:

Apply a consistent triage sequence and confirm remediation effectiveness.

Steps:

  1. Prioritize Host Down and VM Error first (highest service-impact risk).

  2. Prioritize Capacity and Storage next (resource planning and risk mitigation).

  3. Prioritize Memory and CPU for workload balancing and scaling actions.

  4. Track Resource Limit for proactive quota expansion before operations are blocked.

Warning

Alerts do not resolve by themselves in workflow terms; address root cause and verify recurrence has stopped.

Expected Outcome:

  • Alert triage order is clear and active-risk items are worked first.

If this fails:

  1. Confirm new Sent timestamps are still appearing.

  2. Re-check dependencies and recent changes.

  3. Escalate with alert IDs and timestamps.

Expected Outcome

  • You can identify which alert categories are highest risk.

  • You can filter quickly to the affected resources and time window.

  • You can apply a consistent triage process and verify recurrence is resolved.