Appendices

A. Troubleshooting Prep Checklist

Before opening a support request, collect:

  • exact error message (copy text, do not paraphrase)

  • screenshot showing the full page/state

  • timestamp and timezone

  • section + step where the issue occurred

  • impact summary (what is blocked)

B. Credential Recovery Workflow (Admin Access)

Use this workflow when the primary admin cannot authenticate.

Guardrails:

  • Perform recovery only with approved change control and ownership verification.

  • Record who performed recovery, when, and why.

  • Never send passwords in plain text over chat/email.

Recovery steps:

  1. Confirm failure scope (single user vs all admin users).

  2. Validate endpoint reachability and time sync before reset attempts.

Path A (preferred): reset from another active admin account

  1. Log in with a secondary admin account.

  2. Open User Management.

  3. Locate target admin user and run the password reset/edit workflow.

  4. Set a temporary credential and require immediate rotation after first login.

  5. Revalidate role/2FA settings and log the recovery event.

Path B (break-glass): local console recovery when no admin login is available

  1. Open physical console or BMC/IPMI remote console for the management node.

  2. Sign in with an approved local recovery account.

  3. Run a controlled password reset command for the target account:

sudo passwd <admin-username>
  1. Confirm reset success, sign out from console, then test Control Center login.

  2. Rotate from temporary credential to permanent credential immediately after access is restored.

  3. Re-validate role assignment and 2FA/approval settings after access returns.

  4. Document incident timeline and close with post-recovery review.

If platform-level recovery is blocked, escalate through Karios Support with:

  • environment ownership evidence

  • affected account/username

  • exact error and timestamp/timezone

  • actions already attempted

C. Operational Quick Checks

Infrastructure checks:

  • zone/pod/cluster/host states are healthy

  • required storage is available and Up

  • required network ranges are configured and non-overlapping

VM checks:

  • required instance profiles exist

  • VM template or boot image is Ready

  • VM dashboard and lifecycle actions are accessible

D. Support Escalation Matrix

Use Zoho Desk (standard path):

  • normal incidents and service requests

  • status tracking and closure audit

Use support@karios.com (urgent path):

  • portal access unavailable

  • critical blocking incident requiring immediate triage

E. Documentation Usage Tips

  • Follow the section order from the main index for first-time setup.

  • Use Expected Outcome and Success Checkpoint blocks before moving to next section.

  • If UI labels differ slightly by environment/version, follow the equivalent navigation path.

F. Incident Report Template (Copy/Paste)

Use this template when opening a support ticket:

Subject:
[Component] Short problem summary

Severity:
(critical/high/medium/low)

Environment:
(prod/staging/dev) + zone/cluster/host if known

Step and Doc Reference:
(Section + step number where issue happened)

Error Message:
(Exact text copied from UI/log)

Timestamp:
(Date, time, timezone)

Business Impact:
(What is blocked right now)

Actions Already Tried:
1.
2.
3.

Attachments:
- screenshot(s)
- relevant logs

G. Supported Versions and Compatibility

This matrix is a release-time compatibility snapshot. Review and update it during every Karios release documentation pass and after any approved image-catalog, browser-support, or automation-provider change.

Kubernetes versions:

Version

Support Status

Notes

v1.31.x

Recommended

Default target for new cluster creation.

v1.30.x

Supported

Stable option where v1.31.x adoption is pending.

v1.29.x

Supported

Use for compatibility-bound workloads.

v1.28.x

Legacy

Upstream EOL reached (June 2025); use only for migration windows.

Operating system templates (typical availability):

OS Template

Status

Notes

Ubuntu 24.04 LTS

Recommended

Preferred default for new Linux workloads.

Ubuntu 22.04 LTS

Supported

Common long-term production baseline.

Debian 12 (Bookworm)

Supported

Stable Debian baseline.

Debian 11 (Bullseye)

Legacy

Keep for compatibility only; plan upgrade.

Rocky Linux 9

Supported

Enterprise Linux stream.

Rocky Linux 8

Legacy

Supported for transition scenarios.

Windows Server 2022

Supported

Preferred current Windows template.

Windows Server 2019

Supported

Use where workload compatibility requires it.

Hypervisor and virtualization stack:

Component

Support Status

Notes

KVM

Primary

Standard hypervisor path for Karios deployments.

QEMU 8.x+

Supported baseline

Validate exact package versions against your deployed package bundle.

Storage backends:

Backend

Support Status

Notes

ZFS

Default

Built-in default path for Instance Storage.

NFS

Supported

Common for Image Storage and some Instance Storage use cases.

iSCSI

Supported

Validate latency and multipath design before production.

Ceph RBD

Conditional

Supported when environment validation criteria are met.

Network isolation modes:

Mode

Support Status

Notes

VLAN (802.1Q)

Fully supported

Preferred for straightforward L2 segmentation.

VXLAN

Fully supported

Preferred for large-scale overlay segmentation.

GRE

Conditional

Confirm availability in your installed environment configuration.

Browser compatibility baseline:

Browser

Baseline

Notes

Chrome

Current stable or previous major release

Recommended for full UI compatibility.

Firefox

Current stable or previous major release

Supported for daily operations.

Edge

Current stable or previous major release

Supported on Windows admin endpoints.

Safari

Current major release on a supported macOS version

Supported on macOS admin endpoints.

Note

Validate exact compatibility against your deployed Karios release documentation before production rollout.

H. API and Automation Examples

Terraform provider example:

terraform {
required_providers {
karios = {
source = "karios/karios"
}
}
}

provider "karios" {
api_url = "https://<server-ip>/api/v1"
api_key = var.karios_api_key
secret_key = var.karios_secret_key
}

Provider/source validation guidance:

  • Pin the provider version to the currently validated release in your organization-approved registry.

  • Confirm provider availability and supported resources in your environment before production automation.

API reference discovery guidance:

  • Check common spec paths:

  • https://<server-ip>/api/v1/openapi.json

  • https://<server-ip>/swagger

  • https://<server-ip>/swagger-ui

  • If no spec endpoint is exposed, request your environment’s API reference from support.

VM resource example:

resource "karios_virtual_machine" "web_server" {
name = "web-server-01"
zone_id = data.karios_zone.primary.id
service_offering = "medium"
template_id = data.karios_template.ubuntu2404.id
network_id = data.karios_network.app_network.id
}

Direct API example:

# List VMs
curl -X GET "https://<server-ip>/api/v1/virtual-machines" \
-H "X-API-Key: ${API_KEY}" \
-H "X-Secret-Key: ${SECRET_KEY}"

# Create VM
curl -X POST "https://<server-ip>/api/v1/virtual-machines" \
-H "X-API-Key: ${API_KEY}" \
-H "X-Secret-Key: ${SECRET_KEY}" \
-H "Content-Type: application/json" \
-d '{
"name": "api-vm-01",
"zone_id": "abc-123",
"service_offering_id": "def-456",
"template_id": "ghi-789"
}'

I. Troubleshooting Decision Tree

VM will not start:

  • Check logs for insufficient memory -> use a smaller instance profile or add host RAM.

  • Check logs for no suitable host -> verify host state/capacity.

  • Check logs for network not found -> verify network exists and is operational.

  • If no clear error, capture VM ID + timestamp and escalate.

Network issues (VM has no IP):

  • Shared network: verify IP pool capacity and external DHCP/gateway conditions.

  • Isolated network: verify state is operational and virtual router is healthy.

  • Validate DHCP path from VM console if needed.

Storage near/full:

  • Identify largest volumes and stale snapshots.

  • Apply immediate cleanup where safe.

  • Add/expand Instance Storage capacity for sustained recovery.

Control plane unreachable:

  • Verify management endpoint reachability and certificate validity.

  • Check host/service health from infrastructure and diagnostics views.

  • Correlate with recent updates/restarts and events timeline.

  • Escalate with endpoint, timestamp, and dependency evidence.

Kubernetes cluster stuck in error:

  • Open Kubernetes and verify cluster state, VM/node state, and access tab behavior.

  • Correlate with events/alerts and recent lifecycle actions.

  • Avoid repeated create/delete loops without log capture.

  • Escalate with cluster ID, state transitions, and log snippets.

K-Shield scan fails to run:

  • Validate VM/node state and scan prerequisites.

  • Retry once after confirming no concurrent maintenance action.

  • Capture failure details and rerun evidence from Security section.

  • Escalate persistent failures with scan timestamp and target IDs.

Host goes offline:

  • Confirm physical power/network/BMC reachability first.

  • Check cluster/HA impact and affected workload placement.

  • Recover host per infrastructure runbook before workload rebalance changes.

  • Record outage duration and impacted resources for incident report.

J. Control Plane HA Planning

Use this planning baseline for resilient management-plane operations:

  • define failure domains (power, rack, network paths)

  • avoid single points of failure for management endpoint dependencies

  • validate quorum/consensus design where multi-node control-plane is used

  • test failover behavior before production cutover

Warning

HA design is architecture-specific. Validate exact topology and supportability with your platform team before implementation.

K. Platform Backup and Restore Baseline

Minimum backup scope for production:

  • management-plane configuration and secrets

  • storage metadata and critical policy/config objects

  • workload data protection path (snapshots/backups) aligned with RPO/RTO

Restore readiness checklist:

  1. Define and document RPO/RTO per workload class.

  2. Run periodic restore tests (not backup-only checks).

  3. Keep offline/exported recovery evidence and runbook versions.

  4. Verify post-restore access, networking, and role controls.

L. Platform Update and Patching Workflow

Use a controlled update workflow:

  1. Read release notes/change summary for target version.

  2. Validate compatibility (hypervisor, storage, network plugins, automation).

  3. Backup critical config and ensure rollback path.

  4. Test patch/update in non-production first.

  5. Apply in approved maintenance window.

  6. Validate core functions (login, infra health, VM/K8s paths, alerts).

  7. Document results and residual risk.

Release notes and changelog requirement:

  • Do not patch without version change notes and known-issue review.

  • Preferred source: official release communication sent by Karios support/account team.

  • Alternate source: your Zoho Desk release/update thread (if enabled for your account).

  • If neither source is available, request the target version release-notes package from support@karios.com before proceeding.

M. Windows VM Operational Notes

For Windows templates (for example Server 2019/2022):

  • validate activation workflow and access controls before production rollout

  • confirm RDP exposure policy and firewall controls

  • standardize sysprep/generalization process for reusable templates

  • define WinRM/agent posture for automation and configuration management

  • verify time sync and domain-join dependencies where applicable

N. External Monitoring and SIEM Integration

Minimum integration blueprint:

  • send operational alerts to team notification/on-call pipeline

  • forward critical security/ops events to SIEM/log analytics workflow

  • define ownership for alert tuning, runbook updates, and false-positive review

Use Observability and Diagnostics as baseline signal sources.

O. Performance Tuning Baseline

Use controlled tuning cycles instead of broad one-time changes:

  1. Capture baseline metrics first (CPU, memory, storage latency, network errors).

  2. Tune one layer at a time (compute, storage, network, then workload settings).

  3. Re-measure after each change and keep rollback-ready checkpoints.

  4. Document tuned values, observed impact, and owner approval.

Common focus areas:

  • host overcommit and workload placement strategy

  • storage performance characteristics and snapshot overhead

  • network throughput/latency bottlenecks and segmentation policy