Skip to main content

jhf-loom Operations

Documentation Map

Operations

Tool / Contract Summary

This page describes the current operational posture of the live Loom runtime. It focuses on service health, readiness rules, monitoring expectations, and operator boundaries.

Current Verified State

Repo-owned runtime exists and is live on <internal-runtime-redacted>.

Live stack members:

  • jhf-loom-db
  • jhf-loom-activemq
  • jhf-loom-transform
  • jhf-loom-search
  • jhf-loom-repo
  • jhf-loom-share

Canonical public path:

  • https://<internal-runtime-redacted>/

Available Now

Operational assets committed in-repo:

  • compose.yml (compose.yml)
  • compose.low-cpu.yml (compose.low-cpu.yml)
  • compose.health-fast.yml (compose.health-fast.yml)
  • config/runtime/runtime-contract.json (config/runtime/runtime-contract.json)
  • config/runtime/alfresco-runtime-profile.json (config/runtime/alfresco-runtime-profile.json)
  • config/runtime/alfresco-component-family.json (config/runtime/alfresco-component-family.json)

The canonical Alfresco runtime profile is documented in:

  • ALFRESCO_RUNTIME_PROFILE.md (docs/ALFRESCO_RUNTIME_PROFILE.md)

Operational verify scripts:

  • validate_runtime_contract.py
  • validate_alfresco_runtime_profile.py
  • validate_alfresco_component_family.py
  • validate_runtime_materialization_drift.py
  • validate_readme_raw_truth.py
  • validate_live_proxy_sso_smoke.py
  • validate_live_runtime_resilience.py
  • validate_live_cold_start_recovery.py
  • validate_live_low_idle_runtime_policy.py

Operational control scripts:

  • loom-runtime-control.sh preflight
  • loom-runtime-control.sh phased-start
  • loom-runtime-control.sh apply
  • loom-runtime-control.sh pause
  • loom-runtime-control.sh resume
  • loom-runtime-control.sh postcheck
  • loom-runtime-control.sh diagnostics
  • loom-runtime-control.sh memory-guard

Upgrade-family inventory surfaces:

  • maintenance/pull_stack_oss_inventory.py
  • config/runtime/alfresco-component-family.json

Readiness / Drift / Monitoring

Hard runtime rules:

  • search is mandatory for green readiness
  • no partial-green state is allowed when search is red
  • accepted failure posture is degraded or halted, never green
  • canonical Loom readiness gate is:
    • bash scripts/loom-runtime-control.sh readiness
  • repository /probes/-ready- is treated as repo-local signal only, not full Loom green readiness on its own
  • ActiveMQ is recoverable supporting infrastructure, not business truth
  • shared-host startup must use the shared-host-low-idle phased bring-up path
  • default healthcheck cadence is shared-host-slow for shared-host baseline load control
  • dedicated-fast healthchecks are optional only and require explicit JHF_LOOM_HEALTHCHECK_PROFILE=dedicated-fast
  • heavy-profile startup on undersized hosts is blocked by preflight unless JHF_LOOM_ALLOW_UNDERSIZED_HEAVY_PROFILE=1 is explicitly set
  • default phased polling is low-pressure (15s); tight watchdog loops are not a supported default
  • Phased startup cooldown spacing (45s) is enabled by default to reduce warmup overlap for transform/repo/share
  • apply skips services that are already healthy to avoid unnecessary restart pressure on shared hosts
  • compose/env drift is checksum-gated; changed inputs force rollout on the next apply so config updates are not silently skipped
  • repo-owned control runs also hydrate missing non-secret runtime defaults from .env.example into the live stack .env before compose evaluation and update drifted non-secret budget keys when the repo-owned truth changes
  • runtime materialization drift must also be checked across repo files, live stack files, container labels/env/mounts/ports/networks, and app readback:
    • python scripts/validate_runtime_materialization_drift.py --host <internal-runtime-redacted> --stack-dir /home/administrator/jhf-loom-live-stack --insecure
  • shared-host memory reclaim must be guarded before host-owner swap surgery:
    • bash scripts/loom-runtime-control.sh memory-guard
    • runbook: SHARED_HOST_MEMORY_RECLAIM.md (docs/SHARED_HOST_MEMORY_RECLAIM.md)
  • no-repeat lock prevents concurrent runtime-control mutation paths:
    • .loom-runtime-control/runtime-control.lock
  • diagnostics must stay bounded (timeout + --since + --tail)
  • restart recovery is bounded with backoff and optional jitter
  • incident response may pause only non-critical services; jhf-loom-db remains the single kept-running service in pause mode
  • post-deploy cleanup runs after phased start/resume to remove stale stopped containers and rotate old diagnostic captures
  • transform/repo/share run with Docker init reaping enabled (init: true) to prevent zombie child accumulation during restart cycles

Current drift to track:

  • public functional probes are green
  • Docker can still report jhf-loom-search as unhealthy due to a stale health-task NotFound error on the host
  • this drift must not be documented as full operational perfection

Latest soak evidence:

  • ALFRESCO_RUNTIME_SOAK_EVIDENCE_2026-04-21.md (docs/ALFRESCO_RUNTIME_SOAK_EVIDENCE_2026-04-21.md)
  • ALFRESCO_HEALTHCHECK_PROFILE_EVIDENCE_2026-04-21.md (docs/ALFRESCO_HEALTHCHECK_PROFILE_EVIDENCE_2026-04-21.md)
  • RUNTIME_GUARDRAILS_V1_EVIDENCE_2026-04-23.md (docs/RUNTIME_GUARDRAILS_V1_EVIDENCE_2026-04-23.md)

Deployment / Verify

Repo:

python scripts/validate_repo_baseline.py
python scripts/validate_runtime_contract.py
python scripts/validate_alfresco_runtime_profile.py
python scripts/validate_alfresco_component_family.py
python scripts/validate_runtime_materialization_drift.py
python scripts/validate_readme_raw_truth.py
python scripts/validate_runtime_guardrails_v1.py
python scripts/validate_host_capacity_budget.py
python maintenance/pull_stack_oss_inventory.py --output test-results/stack-oss-inventory.workspace.json
python -m unittest discover -s tests -p "test_*.py"

Live:

python scripts/validate_runtime_contract.py --host <internal-runtime-redacted>
python scripts/validate_alfresco_runtime_profile.py --host <internal-runtime-redacted>
python scripts/validate_alfresco_component_family.py --host <internal-runtime-redacted>
python scripts/validate_runtime_materialization_drift.py --host <internal-runtime-redacted> --stack-dir /home/administrator/jhf-loom-live-stack --insecure
python maintenance/pull_stack_oss_inventory.py --host <internal-runtime-redacted> --output test-results/stack-oss-inventory.workspace.json
python scripts/validate_live_low_idle_runtime_policy.py --insecure --idle-window-seconds 1800 --moderate-traffic-seconds 90
python scripts/validate_live_runtime_resilience.py --password <host-managed-secret> --insecure
python scripts/validate_live_cold_start_recovery.py --password <host-managed-secret> --insecure
bash scripts/loom-runtime-control.sh memory-guard
bash scripts/loom-runtime-control.sh diagnostics
bash scripts/loom-runtime-control.sh postcheck
bash scripts/loom-runtime-control.sh readiness

Known Limits

  • ingress, DNS, and TLS are not operated from this repo
  • direct raw service ports are not canonical public paths
  • host-level Docker health drift can diverge from the public-path functional posture and must be called out explicitly

License: AGPLv3.

Helpifyr: https://helpifyr.com