Appendices
A. Troubleshooting Prep Checklist
Before opening a support request, collect:
exact error message (copy text, do not paraphrase)
screenshot showing the full page/state
timestamp and timezone
section + step where the issue occurred
impact summary (what is blocked)
B. Credential Recovery Workflow (Admin Access)
Use this workflow when the primary admin cannot authenticate.
Guardrails:
Perform recovery only with approved change control and ownership verification.
Record who performed recovery, when, and why.
Never send passwords in plain text over chat/email.
Recovery steps:
Confirm failure scope (single user vs all admin users).
Validate endpoint reachability and time sync before reset attempts.
Path A (preferred): reset from another active admin account
Log in with a secondary admin account.
Open User Management.
Locate target admin user and run the password reset/edit workflow.
Set a temporary credential and require immediate rotation after first login.
Revalidate role/2FA settings and log the recovery event.
Path B (break-glass): local console recovery when no admin login is available
Open physical console or BMC/IPMI remote console for the management node.
Sign in with an approved local recovery account.
Run a controlled password reset command for the target account:
sudo passwd <admin-username>
Confirm reset success, sign out from console, then test Control Center login.
Rotate from temporary credential to permanent credential immediately after access is restored.
Re-validate role assignment and 2FA/approval settings after access returns.
Document incident timeline and close with post-recovery review.
If platform-level recovery is blocked, escalate through Karios Support with:
environment ownership evidence
affected account/username
exact error and timestamp/timezone
actions already attempted
C. Operational Quick Checks
Infrastructure checks:
zone/pod/cluster/host states are healthy
required storage is available and
Uprequired network ranges are configured and non-overlapping
VM checks:
required instance profiles exist
VM template or boot image is
ReadyVM dashboard and lifecycle actions are accessible
D. Support Escalation Matrix
Use Zoho Desk (standard path):
normal incidents and service requests
status tracking and closure audit
Use support@karios.com (urgent path):
portal access unavailable
critical blocking incident requiring immediate triage
E. Documentation Usage Tips
Follow the section order from the main index for first-time setup.
Use
Expected OutcomeandSuccess Checkpointblocks before moving to next section.If UI labels differ slightly by environment/version, follow the equivalent navigation path.
F. Incident Report Template (Copy/Paste)
Use this template when opening a support ticket:
Subject:
[Component] Short problem summary
Severity:
(critical/high/medium/low)
Environment:
(prod/staging/dev) + zone/cluster/host if known
Step and Doc Reference:
(Section + step number where issue happened)
Error Message:
(Exact text copied from UI/log)
Timestamp:
(Date, time, timezone)
Business Impact:
(What is blocked right now)
Actions Already Tried:
1.
2.
3.
Attachments:
- screenshot(s)
- relevant logs
G. Supported Versions and Compatibility
This matrix is a release-time compatibility snapshot. Review and update it during every Karios release documentation pass and after any approved image-catalog, browser-support, or automation-provider change.
Kubernetes versions:
Version |
Support Status |
Notes |
|---|---|---|
|
Recommended |
Default target for new cluster creation. |
|
Supported |
Stable option where |
|
Supported |
Use for compatibility-bound workloads. |
|
Legacy |
Upstream EOL reached (June 2025); use only for migration windows. |
Operating system templates (typical availability):
OS Template |
Status |
Notes |
|---|---|---|
Ubuntu |
Recommended |
Preferred default for new Linux workloads. |
Ubuntu |
Supported |
Common long-term production baseline. |
Debian |
Supported |
Stable Debian baseline. |
Debian |
Legacy |
Keep for compatibility only; plan upgrade. |
Rocky Linux |
Supported |
Enterprise Linux stream. |
Rocky Linux |
Legacy |
Supported for transition scenarios. |
Windows Server |
Supported |
Preferred current Windows template. |
Windows Server |
Supported |
Use where workload compatibility requires it. |
Hypervisor and virtualization stack:
Component |
Support Status |
Notes |
|---|---|---|
KVM |
Primary |
Standard hypervisor path for Karios deployments. |
QEMU |
Supported baseline |
Validate exact package versions against your deployed package bundle. |
Storage backends:
Backend |
Support Status |
Notes |
|---|---|---|
ZFS |
Default |
Built-in default path for Instance Storage. |
NFS |
Supported |
Common for Image Storage and some Instance Storage use cases. |
iSCSI |
Supported |
Validate latency and multipath design before production. |
Ceph RBD |
Conditional |
Supported when environment validation criteria are met. |
Network isolation modes:
Mode |
Support Status |
Notes |
|---|---|---|
VLAN (802.1Q) |
Fully supported |
Preferred for straightforward L2 segmentation. |
VXLAN |
Fully supported |
Preferred for large-scale overlay segmentation. |
GRE |
Conditional |
Confirm availability in your installed environment configuration. |
Browser compatibility baseline:
Browser |
Baseline |
Notes |
|---|---|---|
Chrome |
Current stable or previous major release |
Recommended for full UI compatibility. |
Firefox |
Current stable or previous major release |
Supported for daily operations. |
Edge |
Current stable or previous major release |
Supported on Windows admin endpoints. |
Safari |
Current major release on a supported macOS version |
Supported on macOS admin endpoints. |
Note
Validate exact compatibility against your deployed Karios release documentation before production rollout.
H. API and Automation Examples
Terraform provider example:
terraform {
required_providers {
karios = {
source = "karios/karios"
}
}
}
provider "karios" {
api_url = "https://<server-ip>/api/v1"
api_key = var.karios_api_key
secret_key = var.karios_secret_key
}
Provider/source validation guidance:
Pin the provider version to the currently validated release in your organization-approved registry.
Confirm provider availability and supported resources in your environment before production automation.
API reference discovery guidance:
Check common spec paths:
https://<server-ip>/api/v1/openapi.json
https://<server-ip>/swagger
https://<server-ip>/swagger-ui
If no spec endpoint is exposed, request your environment’s API reference from support.
VM resource example:
resource "karios_virtual_machine" "web_server" {
name = "web-server-01"
zone_id = data.karios_zone.primary.id
service_offering = "medium"
template_id = data.karios_template.ubuntu2404.id
network_id = data.karios_network.app_network.id
}
Direct API example:
# List VMs
curl -X GET "https://<server-ip>/api/v1/virtual-machines" \
-H "X-API-Key: ${API_KEY}" \
-H "X-Secret-Key: ${SECRET_KEY}"
# Create VM
curl -X POST "https://<server-ip>/api/v1/virtual-machines" \
-H "X-API-Key: ${API_KEY}" \
-H "X-Secret-Key: ${SECRET_KEY}" \
-H "Content-Type: application/json" \
-d '{
"name": "api-vm-01",
"zone_id": "abc-123",
"service_offering_id": "def-456",
"template_id": "ghi-789"
}'
I. Troubleshooting Decision Tree
VM will not start:
Check logs for
insufficient memory-> use a smaller instance profile or add host RAM.Check logs for
no suitable host-> verify host state/capacity.Check logs for
network not found-> verify network exists and is operational.If no clear error, capture VM ID + timestamp and escalate.
Network issues (VM has no IP):
Shared network: verify IP pool capacity and external DHCP/gateway conditions.
Isolated network: verify state is operational and virtual router is healthy.
Validate DHCP path from VM console if needed.
Storage near/full:
Identify largest volumes and stale snapshots.
Apply immediate cleanup where safe.
Add/expand Instance Storage capacity for sustained recovery.
Control plane unreachable:
Verify management endpoint reachability and certificate validity.
Check host/service health from infrastructure and diagnostics views.
Correlate with recent updates/restarts and events timeline.
Escalate with endpoint, timestamp, and dependency evidence.
Kubernetes cluster stuck in error:
Open Kubernetes and verify cluster state, VM/node state, and access tab behavior.
Correlate with events/alerts and recent lifecycle actions.
Avoid repeated create/delete loops without log capture.
Escalate with cluster ID, state transitions, and log snippets.
K-Shield scan fails to run:
Validate VM/node state and scan prerequisites.
Retry once after confirming no concurrent maintenance action.
Capture failure details and rerun evidence from Security section.
Escalate persistent failures with scan timestamp and target IDs.
Host goes offline:
Confirm physical power/network/BMC reachability first.
Check cluster/HA impact and affected workload placement.
Recover host per infrastructure runbook before workload rebalance changes.
Record outage duration and impacted resources for incident report.
J. Control Plane HA Planning
Use this planning baseline for resilient management-plane operations:
define failure domains (power, rack, network paths)
avoid single points of failure for management endpoint dependencies
validate quorum/consensus design where multi-node control-plane is used
test failover behavior before production cutover
Warning
HA design is architecture-specific. Validate exact topology and supportability with your platform team before implementation.
K. Platform Backup and Restore Baseline
Minimum backup scope for production:
management-plane configuration and secrets
storage metadata and critical policy/config objects
workload data protection path (snapshots/backups) aligned with RPO/RTO
Restore readiness checklist:
Define and document RPO/RTO per workload class.
Run periodic restore tests (not backup-only checks).
Keep offline/exported recovery evidence and runbook versions.
Verify post-restore access, networking, and role controls.
L. Platform Update and Patching Workflow
Use a controlled update workflow:
Read release notes/change summary for target version.
Validate compatibility (hypervisor, storage, network plugins, automation).
Backup critical config and ensure rollback path.
Test patch/update in non-production first.
Apply in approved maintenance window.
Validate core functions (login, infra health, VM/K8s paths, alerts).
Document results and residual risk.
Release notes and changelog requirement:
Do not patch without version change notes and known-issue review.
Preferred source: official release communication sent by Karios support/account team.
Alternate source: your Zoho Desk release/update thread (if enabled for your account).
If neither source is available, request the target version release-notes package from
support@karios.combefore proceeding.
M. Windows VM Operational Notes
For Windows templates (for example Server 2019/2022):
validate activation workflow and access controls before production rollout
confirm RDP exposure policy and firewall controls
standardize sysprep/generalization process for reusable templates
define WinRM/agent posture for automation and configuration management
verify time sync and domain-join dependencies where applicable
N. External Monitoring and SIEM Integration
Minimum integration blueprint:
send operational alerts to team notification/on-call pipeline
forward critical security/ops events to SIEM/log analytics workflow
define ownership for alert tuning, runbook updates, and false-positive review
Use Observability and Diagnostics as baseline signal sources.
O. Performance Tuning Baseline
Use controlled tuning cycles instead of broad one-time changes:
Capture baseline metrics first (CPU, memory, storage latency, network errors).
Tune one layer at a time (compute, storage, network, then workload settings).
Re-measure after each change and keep rollback-ready checkpoints.
Document tuned values, observed impact, and owner approval.
Common focus areas:
host overcommit and workload placement strategy
storage performance characteristics and snapshot overhead
network throughput/latency bottlenecks and segmentation policy