Operations
Documentation Map
-
Operations
-
Channel:
stable -
Source repo:
JaddaHelpifyr/jhf-shuttle
Operations
Start / Run / Deploy
Primary modes:
- local/operator CLI via
python -m n8n_expert ... - packaged CLI via
jhf-shuttle ... - optional API runtime via
jhf-shuttle serve - optional OCI image from the root
Dockerfile - optional on-prem compose bundle via
docker-compose.onprem-messaging.yml - optional OCI package path via
scripts/oci_image.sh(explicit version and sha tags)
Canonical runtime contract source:
docs/RUNTIME_STACK_CONTRACT.md
Healthchecks
- API health:
GET /api/v1/health - API status/readiness-adjacent surface:
GET /api/v1/status - mailbox adapter health:
GET /healthz - host-facing mailbox adapter health default:
GET http://<host>:58805/healthz(MAILBOX_ADAPTER_HEALTH_HOST_PORT) - mailbox publish payload guardrail:
MAILBOX_ADAPTER_MAX_PAYLOAD_BYTES(default131072, returns413 payload too largewhen exceeded) - there is no separate committed
/readyzendpoint today
On-prem messaging compose health policy:
jhf-shuttle-natsandjhf-shuttle-mailbox-adapterare the only healthchecked services in the default stack- default interval:
30s, timeout:3s, retries:5, start period:30s - low-CPU override (
docker-compose.onprem-messaging.lowcpu.yml) raises intervals to90s jhf-shuttle-restart-recoveryandopenclaw-lane-wait-observerintentionally run without healthchecks to avoid unnecessary probe churn- low-CPU evidence runbook:
docs/LOW_CPU_24H_EVIDENCE_PLAN.md - low-CPU run status snapshot:
docs/LOW_CPU_24H_EVIDENCE_STATUS_2026-04-02.md - overlap guardrail preflight:
py scripts\check_shuttle_runtime_overlap.pymust return0before cutover/deploy (mailbox/NATS/restart-recovery alias groups)
Lane-wait observer guardrails:
- only one active lane-wait observer may poll logs for the same gateway target at a time
OPENCLAW_LANE_WAIT_PRIMARY_OBSERVER_NAMEdefines deterministic primary ownership when multiple stacks are present- secondary observers enter standby mode and must not continuously poll Docker logs
- default polling is hardened for low host overhead:
OPENCLAW_FLOW_CONTROL_POLL_SECONDS=45- hard floor
OPENCLAW_FLOW_CONTROL_MIN_POLL_SECONDS=30 - idle poll
OPENCLAW_FLOW_CONTROL_IDLE_POLL_SECONDS=120 - bounded log window
OPENCLAW_FLOW_CONTROL_LOG_TAIL_LINES=300 - bounded backoff and jitter (
OPENCLAW_FLOW_CONTROL_MAX_BACKOFF_SECONDS,OPENCLAW_FLOW_CONTROL_POLL_JITTER_SECONDS)
Restart-recovery guardrails:
- restart poll defaults are low-pressure and bounded:
OPENCLAW_RESTART_RECOVERY_MIN_POLL_SECONDS=30OPENCLAW_RESTART_RECOVERY_POLL_SECONDS=45OPENCLAW_RESTART_RECOVERY_MAX_BACKOFF_SECONDS=300OPENCLAW_RESTART_RECOVERY_POLL_JITTER_SECONDS=3
- errors must not trigger tight-loop retries; restart-recovery uses bounded backoff + jitter
Safe Docker log policy for observer diagnostics:
- use bounded log reads only (
timeout+--since+--tail) - do not run unbounded or follow-mode reads on live hosts (
docker logs -f, unlimited--tail) - observer use-case must always stay within bounded read limits and bounded poll intervals
Runtime port-policy verify/readiness path:
python3 scripts/check_host_port_contract.pypython3 scripts/verify_runtime_port_contract.py --jsonpython3 scripts/verify_runtime_port_contract.py --json --ssh-target <internal-runtime-redacted><internal-runtime-redacted>python3 scripts/verify_cpu_safe_runtime_guardrails.py --jsonbash scripts/post_deploy_runtime_cleanup.sh
Cutover-only overlap guardrail:
py scripts\check_shuttle_runtime_overlap.pymust return0before enforcing a single mailbox/NATS/restart-recovery path
Rules:
static-required: undeclared live port drift is a failuredynamic-allowed-with-discovery: discovery source + consumer-safe publish path are mandatoryinternal-only: host port publishes are failuresshared-host-exception: allowed only with explicit contract entry and discovery path
Logs And Artifacts
logs/events.jsonllogs/contexts/*.jsonlogs/upgrade-impact/*.jsonlogs/upgrade-automation/latest.jsonlogs/catalog-refresh/*.jsonlogs/release-hardening/latest.jsonlogs/end-to-end-regression/latest.jsondist/package-metadata.json(generated packaging contract output)
Monitoring-Relevant States
- n8n reachability
- instance version and version gap
- catalog freshness and baseline refresh truth
- latest upgrade summary, alerts, and backlog
- mailbox adapter health
- NATS/JetStream configuration truth
- webhook/callback contract visibility
Dashboard Fields
Grafana should prioritize:
- latest upstream version
- versions behind
- catalog freshness status
- baseline coverage ratio
- top upgrade alerts
- mailbox adapter health and pressure level
Gitea dashboards should prioritize:
- current version
- latest successful verification
- lifecycle stage
- open residual risks
- registered capabilities
- critical dependencies
Operational Gaps
- no unified
/metricsendpoint - no single committed readiness-only endpoint
- long-running operational evidence still depends on scripts and artifacts rather than a persistent control plane
AGPLv3. Learn more at helpifyr.com.