Alerts

Purpose

Alerts provides system-generated risk signals for infrastructure health, capacity, and operational failures.

When to Use

Use this page when you need to identify current risk, prioritize response, and verify whether issues recur.

Step: Start Alerts Workflow

When to Use:

Use this at the start of active-risk triage.

Purpose:

Prioritize actionable alerts and drive a consistent response sequence.

Steps:

Open Control Center -> Observability -> Alerts.
Filter by type/search and prioritize repeated or service-impacting failures.
Map each active alert to a triage action and assign owner/timestamp.
Re-check alert state after remediation changes.

Expected Outcome:

You can prioritize active risks and execute a repeatable first-response workflow.

If this fails:

Clear filters and reload newest alerts.
Continue with detailed steps in the included guide.

When to Use Alerts

Use Alerts when you need system-generated notifications for infrastructure events, thresholds, and operational issues.

Step: Review Alerts

When to Use:

Use this first when triaging active platform risk conditions.

Purpose:

Identify the highest-impact alerts and establish a response order.

Steps:

Open Control Center -> Observability -> Alerts.
Sort by Sent to start with most recent alerts.
Prioritize critical alert categories first (for example Host Down and VM Error).
Use filters and search to isolate related alerts before deep triage.

Tip

Sort by Sent first to assess current incident impact before reviewing older alerts.

Expected Outcome:

High-risk alerts are identified and ranked.
A clear action path exists for the highest-priority alert categories.

If this fails:

Clear filters and retry from newest rows.
Re-apply one filter at a time.

Step: Open Alerts Help

When to Use:

Use this when alert category meaning or operator action is unclear.

Purpose:

Confirm alert semantics and triage guidance before remediation decisions.

Steps:

On the Alerts page, click the help icon.
Review alert category guidance and triage context.
Return to the table and continue with type/search filters.

Alerts dashboard help panel — Alerts help panel.

Expected Outcome:

You can map alert categories to appropriate response actions.

If this fails:

Refresh and reopen help.
Continue with alert type definitions from this guide.

What the Alerts Table Shows

Column	Description
`Name`	Short identifier/name of the alert.
`Type`	Alert category (for example Capacity, Memory, CPU, Storage, Host Down).
`Description`	Detailed trigger/context message for the alert.
`Subject`	Alert message summary.
`Sent`	Timestamp when the alert was generated/sent.

Alert Type Reference

Type	Description
`Capacity`	Overall resource capacity thresholds reached.
`Memory`	Physical/virtual memory usage exceeded limits.
`CPU`	Processor utilization is critically high.
`Storage`	Storage pool usage exceeded thresholds.
`Storage Allocated`	Allocated storage space is nearing limits.
`Public IP`	Public IP pool is nearing exhaustion.
`Private IP`	Private IP range is running low.
`Host`	Host node reported an issue.
`VM Error`	Virtual machine encountered an error.
`Domain Router`	Domain virtual router has an issue.
`Console Proxy`	Console proxy service is not responding.
`Routing`	Network routing issue detected.
`Storage Misc`	Miscellaneous storage subsystem issue.
`Usage Server`	Usage data collection server issue.
`Management Node`	Management server reported a problem.
`Host Down`	Host node is unresponsive/offline.
`VM Migration`	VM live migration encountered an error.
`Secondary Storage`	Image Storage reported an issue. This alert type still uses the legacy backend name `Secondary Storage`.
`Usage Server Result`	Usage server returned unexpected results.
`Storage Delete`	Storage deletion operation error.
`Update Resource Count`	Resource count update failed.
`Usage Sanity Result`	Usage sanity check found inconsistencies.
`Direct Attached Public IP`	Direct attached public IP issue detected.
`Local Storage`	Local (non-shared) storage issue.
`Resource Limit`	Account/domain resource limits are being approached.
`Sync`	Synchronization between components failed.
`Upload Failed`	Boot image/template upload operation failed.
`VM Snapshot`	VM snapshot operation failed.

Step: Filter and Search Alerts

When to Use:

Use this when alert volume is high and triage requires focused filtering.

Purpose:

Reduce alert noise to the affected service/resource and incident window.

Steps:

Use All Types to isolate a specific alert category.
Use search to filter by alert name/content (host, VM, storage pool, or other keywords).
Use Sent ordering to focus on the newest alerts first.
Narrow to a time window when investigating an incident.

Expected Outcome:

Filtered results contain only actionable alerts for the current incident path.

If this fails:

Keep only one Type filter active.
Add one keyword for the affected resource.
Reduce to the exact incident window.

Step: Prioritize and Recheck Alerts

When to Use:

Use this after filtering alerts to decide response order and verify recurrence.

Purpose:

Apply a consistent triage sequence and confirm remediation effectiveness.

Steps:

Prioritize Host Down and VM Error first (highest service-impact risk).
Prioritize Capacity and Storage next (resource planning and risk mitigation).
Prioritize Memory and CPU for workload balancing and scaling actions.
Track Resource Limit for proactive quota expansion before operations are blocked.

Warning

Alerts do not resolve by themselves in workflow terms; address root cause and verify recurrence has stopped.

Expected Outcome:

Alert triage order is clear and active-risk items are worked first.

If this fails:

Confirm new Sent timestamps are still appearing.
Re-check dependencies and recent changes.
Escalate with alert IDs and timestamps.

Expected Outcome

You can identify which alert categories are highest risk.
You can filter quickly to the affected resources and time window.
You can apply a consistent triage process and verify recurrence is resolved.