Operations

Documentation Map

Overview
Quick Start
Installation
Configuration
Operations
Troubleshooting
Release Notes
Compatibility
Channel: latest
Source repo: JaddaHelpifyr/jhf-shuttle

Operations

Start / Run / Deploy

Primary modes:

local/operator CLI via python -m n8n_expert ...
packaged CLI via jhf-shuttle ...
optional API runtime via jhf-shuttle serve
optional OCI image from the root Dockerfile
optional on-prem compose bundle via docker-compose.onprem-messaging.yml
optional OCI package path via scripts/oci_image.sh (explicit version and sha tags)

Canonical runtime contract source:

docs/RUNTIME_STACK_CONTRACT.md

Healthchecks

API health: GET /api/v1/health
API status/readiness-adjacent surface: GET /api/v1/status
mailbox adapter health: GET /healthz
host-facing mailbox adapter health default: GET http://<host>:58805/healthz (MAILBOX_ADAPTER_HEALTH_HOST_PORT)
mailbox publish payload guardrail: MAILBOX_ADAPTER_MAX_PAYLOAD_BYTES (default 131072, returns 413 payload too large when exceeded)
there is no separate committed /readyz endpoint today

On-prem messaging compose health policy:

jhf-shuttle-nats and jhf-shuttle-mailbox-adapter are the only healthchecked services in the default stack
default interval: 30s, timeout: 3s, retries: 5, start period: 30s
low-CPU override (docker-compose.onprem-messaging.lowcpu.yml) raises intervals to 90s
jhf-shuttle-restart-recovery and openclaw-lane-wait-observer intentionally run without healthchecks to avoid unnecessary probe churn
low-CPU evidence runbook: docs/LOW_CPU_24H_EVIDENCE_PLAN.md
low-CPU run status snapshot: docs/LOW_CPU_24H_EVIDENCE_STATUS_2026-04-02.md
overlap guardrail preflight: py scripts\check_shuttle_runtime_overlap.py must return 0 before cutover/deploy (mailbox/NATS/restart-recovery alias groups)

Lane-wait observer guardrails:

only one active lane-wait observer may poll logs for the same gateway target at a time
OPENCLAW_LANE_WAIT_PRIMARY_OBSERVER_NAME defines deterministic primary ownership when multiple stacks are present
secondary observers enter standby mode and must not continuously poll Docker logs
default polling is hardened for low host overhead:
- OPENCLAW_FLOW_CONTROL_POLL_SECONDS=45
- hard floor OPENCLAW_FLOW_CONTROL_MIN_POLL_SECONDS=30
- idle poll OPENCLAW_FLOW_CONTROL_IDLE_POLL_SECONDS=120
- bounded log window OPENCLAW_FLOW_CONTROL_LOG_TAIL_LINES=300
- bounded backoff and jitter (OPENCLAW_FLOW_CONTROL_MAX_BACKOFF_SECONDS, OPENCLAW_FLOW_CONTROL_POLL_JITTER_SECONDS)

Restart-recovery guardrails:

restart poll defaults are low-pressure and bounded:
- OPENCLAW_RESTART_RECOVERY_MIN_POLL_SECONDS=30
- OPENCLAW_RESTART_RECOVERY_POLL_SECONDS=45
- OPENCLAW_RESTART_RECOVERY_MAX_BACKOFF_SECONDS=300
- OPENCLAW_RESTART_RECOVERY_POLL_JITTER_SECONDS=3
errors must not trigger tight-loop retries; restart-recovery uses bounded backoff + jitter

Safe Docker log policy for observer diagnostics:

use bounded log reads only (timeout + --since + --tail)
do not run unbounded or follow-mode reads on live hosts (docker logs -f, unlimited --tail)
observer use-case must always stay within bounded read limits and bounded poll intervals

Runtime port-policy verify/readiness path:

python3 scripts/check_host_port_contract.py
python3 scripts/verify_runtime_port_contract.py --json
python3 scripts/verify_runtime_port_contract.py --json --ssh-target <internal-runtime-redacted><internal-runtime-redacted>
python3 scripts/verify_cpu_safe_runtime_guardrails.py --json
bash scripts/post_deploy_runtime_cleanup.sh

Cutover-only overlap guardrail:

py scripts\check_shuttle_runtime_overlap.py must return 0 before enforcing a single mailbox/NATS/restart-recovery path

Rules:

static-required: undeclared live port drift is a failure
dynamic-allowed-with-discovery: discovery source + consumer-safe publish path are mandatory
internal-only: host port publishes are failures
shared-host-exception: allowed only with explicit contract entry and discovery path

Logs And Artifacts

logs/events.jsonl
logs/contexts/*.json
logs/upgrade-impact/*.json
logs/upgrade-automation/latest.json
logs/catalog-refresh/*.json
logs/release-hardening/latest.json
logs/end-to-end-regression/latest.json
dist/package-metadata.json (generated packaging contract output)

Monitoring-Relevant States

n8n reachability
instance version and version gap
catalog freshness and baseline refresh truth
latest upgrade summary, alerts, and backlog
mailbox adapter health
NATS/JetStream configuration truth
webhook/callback contract visibility

Dashboard Fields

Grafana should prioritize:

latest upstream version
versions behind
catalog freshness status
baseline coverage ratio
top upgrade alerts
mailbox adapter health and pressure level

Gitea dashboards should prioritize:

current version
latest successful verification
lifecycle stage
open residual risks
registered capabilities
critical dependencies

Operational Gaps

no unified /metrics endpoint
no single committed readiness-only endpoint
long-running operational evidence still depends on scripts and artifacts rather than a persistent control plane

AGPLv3. Learn more at helpifyr.com.

Documentation Map​

Operations

Start / Run / Deploy​

Healthchecks​

Logs And Artifacts​

Monitoring-Relevant States​

Dashboard Fields​

Operational Gaps​

Documentation Map

Start / Run / Deploy

Healthchecks

Logs And Artifacts

Monitoring-Relevant States

Dashboard Fields

Operational Gaps