Appendices

A. Troubleshooting Prep Checklist

Before opening a support request, collect:

exact error message (copy text, do not paraphrase)
screenshot showing the full page/state
timestamp and timezone
section + step where the issue occurred
impact summary (what is blocked)

B. Credential Recovery Workflow (Admin Access)

Use this workflow when the primary admin cannot authenticate.

Guardrails:

Perform recovery only with approved change control and ownership verification.
Record who performed recovery, when, and why.
Never send passwords in plain text over chat/email.

Recovery steps:

Confirm failure scope (single user vs all admin users).
Validate endpoint reachability and time sync before reset attempts.

Path A (preferred): reset from another active admin account

Log in with a secondary admin account.
Open User Management.
Locate target admin user and run the password reset/edit workflow.
Set a temporary credential and require immediate rotation after first login.
Revalidate role/2FA settings and log the recovery event.

Path B (break-glass): local console recovery when no admin login is available

Open physical console or BMC/IPMI remote console for the management node.
Sign in with an approved local recovery account.
Run a controlled password reset command for the target account:

sudo passwd <admin-username>

Confirm reset success, sign out from console, then test Control Center login.
Rotate from temporary credential to permanent credential immediately after access is restored.
Re-validate role assignment and 2FA/approval settings after access returns.
Document incident timeline and close with post-recovery review.

If platform-level recovery is blocked, escalate through Karios Support with:

environment ownership evidence
affected account/username
exact error and timestamp/timezone
actions already attempted

C. Operational Quick Checks

Infrastructure checks:

zone/pod/cluster/host states are healthy
required storage is available and Up
required network ranges are configured and non-overlapping

VM checks:

required instance profiles exist
VM template or boot image is Ready
VM dashboard and lifecycle actions are accessible

D. Support Escalation Matrix

Use Zoho Desk (standard path):

normal incidents and service requests
status tracking and closure audit

Use support@karios.com (urgent path):

portal access unavailable
critical blocking incident requiring immediate triage

E. Documentation Usage Tips

Follow the section order from the main index for first-time setup.
Use Expected Outcome and Success Checkpoint blocks before moving to next section.
If UI labels differ slightly by environment/version, follow the equivalent navigation path.

F. Incident Report Template (Copy/Paste)

Use this template when opening a support ticket:

Subject:
[Component] Short problem summary

Severity:
(critical/high/medium/low)

Environment:
(prod/staging/dev) + zone/cluster/host if known

Step and Doc Reference:
(Section + step number where issue happened)

Error Message:
(Exact text copied from UI/log)

Timestamp:
(Date, time, timezone)

Business Impact:
(What is blocked right now)

Actions Already Tried:
1.
2.
3.

Attachments:
- screenshot(s)
- relevant logs

G. Supported Versions and Compatibility

This matrix is a release-time compatibility snapshot. Review and update it during every Karios release documentation pass and after any approved image-catalog, browser-support, or automation-provider change.

Kubernetes versions:

Version	Support Status	Notes
`v1.31.x`	Recommended	Default target for new cluster creation.
`v1.30.x`	Supported	Stable option where `v1.31.x` adoption is pending.
`v1.29.x`	Supported	Use for compatibility-bound workloads.
`v1.28.x`	Legacy	Upstream EOL reached (June 2025); use only for migration windows.

Operating system templates (typical availability):

OS Template	Status	Notes
Ubuntu `24.04 LTS`	Recommended	Preferred default for new Linux workloads.
Ubuntu `22.04 LTS`	Supported	Common long-term production baseline.
Debian `12` (Bookworm)	Supported	Stable Debian baseline.
Debian `11` (Bullseye)	Legacy	Keep for compatibility only; plan upgrade.
Rocky Linux `9`	Supported	Enterprise Linux stream.
Rocky Linux `8`	Legacy	Supported for transition scenarios.
Windows Server `2022`	Supported	Preferred current Windows template.
Windows Server `2019`	Supported	Use where workload compatibility requires it.

Hypervisor and virtualization stack:

Component	Support Status	Notes
KVM	Primary	Standard hypervisor path for Karios deployments.
QEMU `8.x+`	Supported baseline	Validate exact package versions against your deployed package bundle.

Storage backends:

Backend	Support Status	Notes
ZFS	Default	Built-in default path for Instance Storage.
NFS	Supported	Common for Image Storage and some Instance Storage use cases.
iSCSI	Supported	Validate latency and multipath design before production.
Ceph RBD	Conditional	Supported when environment validation criteria are met.

Network isolation modes:

Mode	Support Status	Notes
VLAN (802.1Q)	Fully supported	Preferred for straightforward L2 segmentation.
VXLAN	Fully supported	Preferred for large-scale overlay segmentation.
GRE	Conditional	Confirm availability in your installed environment configuration.

Browser compatibility baseline:

Browser	Baseline	Notes
Chrome	Current stable or previous major release	Recommended for full UI compatibility.
Firefox	Current stable or previous major release	Supported for daily operations.
Edge	Current stable or previous major release	Supported on Windows admin endpoints.
Safari	Current major release on a supported macOS version	Supported on macOS admin endpoints.

Note

Validate exact compatibility against your deployed Karios release documentation before production rollout.

H. API and Automation Examples

Terraform provider example:

terraform {
required_providers {
karios = {
source = "karios/karios"
}
}
}

provider "karios" {
api_url = "https://<server-ip>/api/v1"
api_key = var.karios_api_key
secret_key = var.karios_secret_key
}

Provider/source validation guidance:

Pin the provider version to the currently validated release in your organization-approved registry.
Confirm provider availability and supported resources in your environment before production automation.

API reference discovery guidance:

Check common spec paths:

https://<server-ip>/api/v1/openapi.json

https://<server-ip>/swagger

https://<server-ip>/swagger-ui

If no spec endpoint is exposed, request your environment’s API reference from support.

VM resource example:

resource "karios_virtual_machine" "web_server" {
name = "web-server-01"
zone_id = data.karios_zone.primary.id
service_offering = "medium"
template_id = data.karios_template.ubuntu2404.id
network_id = data.karios_network.app_network.id
}

Direct API example:

# List VMs
curl -X GET "https://<server-ip>/api/v1/virtual-machines" \
-H "X-API-Key: ${API_KEY}" \
-H "X-Secret-Key: ${SECRET_KEY}"

# Create VM
curl -X POST "https://<server-ip>/api/v1/virtual-machines" \
-H "X-API-Key: ${API_KEY}" \
-H "X-Secret-Key: ${SECRET_KEY}" \
-H "Content-Type: application/json" \
-d '{
"name": "api-vm-01",
"zone_id": "abc-123",
"service_offering_id": "def-456",
"template_id": "ghi-789"
}'

I. Troubleshooting Decision Tree

VM will not start:

Check logs for insufficient memory -> use a smaller instance profile or add host RAM.
Check logs for no suitable host -> verify host state/capacity.
Check logs for network not found -> verify network exists and is operational.
If no clear error, capture VM ID + timestamp and escalate.

Network issues (VM has no IP):

Shared network: verify IP pool capacity and external DHCP/gateway conditions.
Isolated network: verify state is operational and virtual router is healthy.
Validate DHCP path from VM console if needed.

Storage near/full:

Identify largest volumes and stale snapshots.
Apply immediate cleanup where safe.
Add/expand Instance Storage capacity for sustained recovery.

Control plane unreachable:

Verify management endpoint reachability and certificate validity.
Check host/service health from infrastructure and diagnostics views.
Correlate with recent updates/restarts and events timeline.
Escalate with endpoint, timestamp, and dependency evidence.

Kubernetes cluster stuck in error:

Open Kubernetes and verify cluster state, VM/node state, and access tab behavior.
Correlate with events/alerts and recent lifecycle actions.
Avoid repeated create/delete loops without log capture.
Escalate with cluster ID, state transitions, and log snippets.

K-Shield scan fails to run:

Validate VM/node state and scan prerequisites.
Retry once after confirming no concurrent maintenance action.
Capture failure details and rerun evidence from Security section.
Escalate persistent failures with scan timestamp and target IDs.

Host goes offline:

Confirm physical power/network/BMC reachability first.
Check cluster/HA impact and affected workload placement.
Recover host per infrastructure runbook before workload rebalance changes.
Record outage duration and impacted resources for incident report.

J. Control Plane HA Planning

Use this planning baseline for resilient management-plane operations:

define failure domains (power, rack, network paths)
avoid single points of failure for management endpoint dependencies
validate quorum/consensus design where multi-node control-plane is used
test failover behavior before production cutover

Warning

HA design is architecture-specific. Validate exact topology and supportability with your platform team before implementation.

K. Platform Backup and Restore Baseline

Minimum backup scope for production:

management-plane configuration and secrets
storage metadata and critical policy/config objects
workload data protection path (snapshots/backups) aligned with RPO/RTO

Restore readiness checklist:

Define and document RPO/RTO per workload class.
Run periodic restore tests (not backup-only checks).
Keep offline/exported recovery evidence and runbook versions.
Verify post-restore access, networking, and role controls.

L. Platform Update and Patching Workflow

Use a controlled update workflow:

Read release notes/change summary for target version.
Validate compatibility (hypervisor, storage, network plugins, automation).
Backup critical config and ensure rollback path.
Test patch/update in non-production first.
Apply in approved maintenance window.
Validate core functions (login, infra health, VM/K8s paths, alerts).
Document results and residual risk.

Release notes and changelog requirement:

Do not patch without version change notes and known-issue review.
Preferred source: official release communication sent by Karios support/account team.
Alternate source: your Zoho Desk release/update thread (if enabled for your account).
If neither source is available, request the target version release-notes package from support@karios.com before proceeding.

M. Windows VM Operational Notes

For Windows templates (for example Server 2019/2022):

validate activation workflow and access controls before production rollout
confirm RDP exposure policy and firewall controls
standardize sysprep/generalization process for reusable templates
define WinRM/agent posture for automation and configuration management
verify time sync and domain-join dependencies where applicable

N. External Monitoring and SIEM Integration

Minimum integration blueprint:

send operational alerts to team notification/on-call pipeline
forward critical security/ops events to SIEM/log analytics workflow
define ownership for alert tuning, runbook updates, and false-positive review

Use Observability and Diagnostics as baseline signal sources, and K-Trace for the built-in security operations console (detections, incident cases, and threat intelligence).

O. Performance Tuning Baseline

Use controlled tuning cycles instead of broad one-time changes:

Capture baseline metrics first (CPU, memory, storage latency, network errors).
Tune one layer at a time (compute, storage, network, then workload settings).
Re-measure after each change and keep rollback-ready checkpoints.
Document tuned values, observed impact, and owner approval.

Common focus areas:

host overcommit and workload placement strategy
storage performance characteristics and snapshot overhead
network throughput/latency bottlenecks and segmentation policy