jhf-loom Operations
Documentation Map
-
Operations
-
Channel:
stable -
Source repo:
JaddaHelpifyr/jhf-loom
Operations
Tool / Contract Summary
This page describes the current operational posture of the live Loom runtime. It focuses on service health, readiness rules, monitoring expectations, and operator boundaries.
Current Verified State
Repo-owned runtime exists and is live on <internal-runtime-redacted>.
Live stack members:
jhf-loom-dbjhf-loom-activemqjhf-loom-transformjhf-loom-searchjhf-loom-repojhf-loom-share
Canonical public path:
https://<internal-runtime-redacted>/
Available Now
Operational assets committed in-repo:
- compose.yml (
compose.yml) - compose.low-cpu.yml (
compose.low-cpu.yml) - compose.health-fast.yml (
compose.health-fast.yml) - config/runtime/runtime-contract.json (
config/runtime/runtime-contract.json) - config/runtime/alfresco-runtime-profile.json (
config/runtime/alfresco-runtime-profile.json) - config/runtime/alfresco-component-family.json (
config/runtime/alfresco-component-family.json)
The canonical Alfresco runtime profile is documented in:
- ALFRESCO_RUNTIME_PROFILE.md (
docs/ALFRESCO_RUNTIME_PROFILE.md)
Operational verify scripts:
validate_runtime_contract.pyvalidate_alfresco_runtime_profile.pyvalidate_alfresco_component_family.pyvalidate_runtime_materialization_drift.pyvalidate_readme_raw_truth.pyvalidate_live_proxy_sso_smoke.pyvalidate_live_runtime_resilience.pyvalidate_live_cold_start_recovery.pyvalidate_live_low_idle_runtime_policy.py
Operational control scripts:
loom-runtime-control.sh preflightloom-runtime-control.sh phased-startloom-runtime-control.sh applyloom-runtime-control.sh pauseloom-runtime-control.sh resumeloom-runtime-control.sh postcheckloom-runtime-control.sh diagnosticsloom-runtime-control.sh memory-guard
Upgrade-family inventory surfaces:
maintenance/pull_stack_oss_inventory.pyconfig/runtime/alfresco-component-family.json
Readiness / Drift / Monitoring
Hard runtime rules:
- search is mandatory for green readiness
- no partial-green state is allowed when search is red
- accepted failure posture is
degradedorhalted, never green - canonical Loom readiness gate is:
bash scripts/loom-runtime-control.sh readiness
- repository
/probes/-ready-is treated as repo-local signal only, not full Loom green readiness on its own - ActiveMQ is recoverable supporting infrastructure, not business truth
- shared-host startup must use the
shared-host-low-idlephased bring-up path - default healthcheck cadence is
shared-host-slowfor shared-host baseline load control dedicated-fasthealthchecks are optional only and require explicitJHF_LOOM_HEALTHCHECK_PROFILE=dedicated-fast- heavy-profile startup on undersized hosts is blocked by preflight unless
JHF_LOOM_ALLOW_UNDERSIZED_HEAVY_PROFILE=1is explicitly set - default phased polling is low-pressure (
15s); tight watchdog loops are not a supported default - Phased startup cooldown spacing (
45s) is enabled by default to reduce warmup overlap for transform/repo/share applyskips services that are already healthy to avoid unnecessary restart pressure on shared hosts- compose/env drift is checksum-gated; changed inputs force rollout on the next
applyso config updates are not silently skipped - repo-owned control runs also hydrate missing non-secret runtime defaults from
.env.exampleinto the live stack.envbefore compose evaluation and update drifted non-secret budget keys when the repo-owned truth changes - runtime materialization drift must also be checked across repo files, live
stack files, container labels/env/mounts/ports/networks, and app readback:
python scripts/validate_runtime_materialization_drift.py --host <internal-runtime-redacted> --stack-dir /home/administrator/jhf-loom-live-stack --insecure
- shared-host memory reclaim must be guarded before host-owner swap surgery:
bash scripts/loom-runtime-control.sh memory-guard- runbook: SHARED_HOST_MEMORY_RECLAIM.md (
docs/SHARED_HOST_MEMORY_RECLAIM.md)
- no-repeat lock prevents concurrent runtime-control mutation paths:
.loom-runtime-control/runtime-control.lock
- diagnostics must stay bounded (
timeout+--since+--tail) - restart recovery is bounded with backoff and optional jitter
- incident response may pause only non-critical services;
jhf-loom-dbremains the single kept-running service in pause mode - post-deploy cleanup runs after phased start/resume to remove stale stopped containers and rotate old diagnostic captures
- transform/repo/share run with Docker init reaping enabled (
init: true) to prevent zombie child accumulation during restart cycles
Current drift to track:
- public functional probes are green
- Docker can still report
jhf-loom-searchasunhealthydue to a stale health-taskNotFounderror on the host - this drift must not be documented as full operational perfection
Latest soak evidence:
- ALFRESCO_RUNTIME_SOAK_EVIDENCE_2026-04-21.md (
docs/ALFRESCO_RUNTIME_SOAK_EVIDENCE_2026-04-21.md) - ALFRESCO_HEALTHCHECK_PROFILE_EVIDENCE_2026-04-21.md (
docs/ALFRESCO_HEALTHCHECK_PROFILE_EVIDENCE_2026-04-21.md) - RUNTIME_GUARDRAILS_V1_EVIDENCE_2026-04-23.md (
docs/RUNTIME_GUARDRAILS_V1_EVIDENCE_2026-04-23.md)
Deployment / Verify
Repo:
python scripts/validate_repo_baseline.py
python scripts/validate_runtime_contract.py
python scripts/validate_alfresco_runtime_profile.py
python scripts/validate_alfresco_component_family.py
python scripts/validate_runtime_materialization_drift.py
python scripts/validate_readme_raw_truth.py
python scripts/validate_runtime_guardrails_v1.py
python scripts/validate_host_capacity_budget.py
python maintenance/pull_stack_oss_inventory.py --output test-results/stack-oss-inventory.workspace.json
python -m unittest discover -s tests -p "test_*.py"
Live:
python scripts/validate_runtime_contract.py --host <internal-runtime-redacted>
python scripts/validate_alfresco_runtime_profile.py --host <internal-runtime-redacted>
python scripts/validate_alfresco_component_family.py --host <internal-runtime-redacted>
python scripts/validate_runtime_materialization_drift.py --host <internal-runtime-redacted> --stack-dir /home/administrator/jhf-loom-live-stack --insecure
python maintenance/pull_stack_oss_inventory.py --host <internal-runtime-redacted> --output test-results/stack-oss-inventory.workspace.json
python scripts/validate_live_low_idle_runtime_policy.py --insecure --idle-window-seconds 1800 --moderate-traffic-seconds 90
python scripts/validate_live_runtime_resilience.py --password <host-managed-secret> --insecure
python scripts/validate_live_cold_start_recovery.py --password <host-managed-secret> --insecure
bash scripts/loom-runtime-control.sh memory-guard
bash scripts/loom-runtime-control.sh diagnostics
bash scripts/loom-runtime-control.sh postcheck
bash scripts/loom-runtime-control.sh readiness
Known Limits
- ingress, DNS, and TLS are not operated from this repo
- direct raw service ports are not canonical public paths
- host-level Docker health drift can diverge from the public-path functional posture and must be called out explicitly
License: AGPLv3.
Helpifyr: https://helpifyr.com