jhf-spool Operations

Documentation Map

Overview
Quick Start
Installation
Configuration
Operations
Troubleshooting
Release Notes
Compatibility
Channel: stable
Source repo: solarisara/jhf-spool

Operations

Version: 2026-04-01

Start / Run / Deploy

entrypoint: compose.dev.yaml
low-cpu override: compose.lowcpu.yaml
scripts:
- scripts/dev-up.sh
- scripts/dev-down.sh
- scripts/ops/deploy_news_memory_main_stack.sh
- docs/STACK_CONTRACT.md (canonical runtime contract)

Shared-host n8n port defaults:

default host port: NEWS_MEMORY_N8N_PORT=25678
reserved/forbidden by default: 15678 (shared-host global n8n runtime)
startup preflight in both start scripts fails early when the target n8n port is busy
optional override for reserved ports: NEWS_MEMORY_ALLOW_RESERVED_N8N_PORTS=1

Health and Readiness

/v1/health/live
/v1/health/ready
/v1/health/info
/v1/fabric/metadata
/metrics

Additional operator surfaces:

/v1/research/operational-slo-gates
/v1/research/security-compliance-gates
/v1/research/incident-readiness-gates
/v1/research/tls-proxy-readiness
/v1/research/secret-readiness
/v1/research/paddle-readiness

Current healthcheck cadence policy (maintained stack):

postgres, minio, redis: 120s interval, 5s timeout, 3 retries, 60s start period
api: 60s interval, 3s timeout, 3 retries, 30s start period
n8n: 60s interval, 5s timeout, 3 retries, 45s start period
reverse-proxy and observability: host-managed in integrated mode, outside this compose stack

Monitoring

Current stack includes:

OpenTelemetry collector
host-managed Prometheus/Grafana in integrated deployments
runtime and readiness helper scripts

Logs and Diagnostics

Primary diagnostic paths:

API container logs from the maintained Compose stack
host reverse-proxy logs when TLS or upstream routing fails
n8n container logs when scheduled orchestration degrades
host observability surfaces for health, readiness, and freshness signals

Useful scripts:

scripts/ops/run_host_release_check.py
scripts/ops/run_operational_release_checks.py
scripts/ops/run_live_platform_journey.py
scripts/ops/evaluate_n8n_live_readiness.py
scripts/ops/verify_runtime_materialization_drift.py
scripts/ops/evaluate_secret_readiness.py
scripts/ops/evaluate_fabric_combination_consumer.py
scripts/ops/run_lowcpu_soak_probe.py
scripts/ops/query_gitea_actions_runs.py

Gitea Actions run API compatibility:

in this environment /api/v1/repos/{owner}/{repo}/actions/runs can return 404
maintained run-status collection must use scripts/ops/query_gitea_actions_runs.py
the helper uses /actions/runs first and automatically falls back to /actions/tasks on 404

Release-check telemetry snapshot:

run_operational_release_checks.py now emits healthcheck_load in report JSON
block includes:
- host CPU sample (usage_percent_1s, best effort)
- exec_create sample count over a short time window
fallback behavior:
- if CPU sampling fails, cpu_sample.available=false and error text is preserved
- if timeout is unavailable on host, exec_create_sample.available=false with fallback error marker

Example:

python scripts/ops/run_host_release_check.py --output-dir reports/operational-release-checks

Runtime materialization drift verify:

python scripts/ops/verify_runtime_materialization_drift.py \
  --host <internal-runtime-redacted><internal-runtime-redacted> \
  --repo-path-on-host /home/administrator/jhf-spool-main \
  --base-url https://<internal-runtime-redacted> \
  --insecure \
  --output reports/runtime-materialization/latest.json

The verifier compares:

repo-owned runtime contract and compose truth
active host compose materialization
running container env/labels/mounts/networks
app readback from /v1/health/info

It fails on missing keys, undocumented non-interpolated overrides, container/app readback mismatch, and externally visible ingress drift.

Standard Restart Order

When the maintained stack needs a bounded restart:

verify database and storage dependencies first
start or recover PostgreSQL, MinIO, Qdrant, and Redis
start the API service
verify host-managed proxy/TLS edge
verify n8n only after the API is healthy

Compose Core Stack Recovery

If host TLS is up but the product is unavailable:

inspect whether the core jhf-spool services still exist
recover missing core containers before debugging host proxy or TLS
verify /v1/health/live and /v1/health/ready
verify docs and OpenAPI surfaces
re-run the operational release check if the outage affected orchestration or gates

Known Failure Modes

core stack partly absent while host proxy remains up
TLS trust issues mistaken for service outages
external source/provider drift
inactive n8n workflows causing stale automation

Typical 502 Cases

host proxy upstream points to stale backend target
upstream API container stopped or absent
DNS lookup mismatch between host proxy config and active backend target

Typical TLS Cases

local trust failure against the internal Caddy CA
HTTP/HTTPS mismatch during manual checks
wrong public base URL or reverse-proxy configuration

Typical n8n Cases

workflows deployed but inactive
stale NEWS_MEMORY_API_BASE_URL
invalid or missing shared API key
workflow host path still pointing at an old domain/base URL

Runtime Dependencies

Hard:

PostgreSQL
MinIO
Qdrant
Redis

Optional:

n8n
Paddle
NewsAPI
external source providers

Weak Host Mode

For low-resource hosts, run the maintained stack with:

docker compose -f compose.dev.yaml -f compose.lowcpu.yaml up -d --build

This keeps the same healthcheck surfaces but stretches intervals to reduce healthcheck exec load.

For lightweight release telemetry sampling on weak hosts:

keep telemetry windows short (30s default)
avoid full monitoring suites for routine verification
use release report snapshots as regression evidence between rollouts

For 24h low-cpu soak evidence collection:

python scripts/ops/run_lowcpu_soak_probe.py \
  --host <internal-runtime-redacted><internal-runtime-redacted> \
  --stack-prefix jhf-spool- \
  --samples 24 \
  --sample-interval-seconds 3600 \
  --telemetry-window-seconds 30

Host-target note:

when running the collector on the same host as the stack, use --host <internal-runtime-redacted>
user-prefixed self-targets like <internal-runtime-redacted><internal-runtime-redacted> are normalized to local execution by the collector to avoid SSH self-auth failures

Artifacts are written to reports/healthcheck-soak/ as:

per-sample JSONL telemetry stream
summarized metrics JSON

Credential Drift Verify (Spool)

For jhf-spool auth-resilience checks (valid key, invalid key, rotated key) against /v1/search/semantic:

python scripts/ops/verify_spool_auth_rotation.py \
  --base-url https://<internal-runtime-redacted> \
  --valid-key "$VALID_SPOOL_KEY" \
  --invalid-key "$INTENTIONALLY_INVALID_KEY" \
  --rotated-key "$ROTATED_SPOOL_KEY" \
  --insecure \
  --strict \
  --output reports/auth-rotation/latest.json

The output is machine-readable and separates:

auth drift (401 on invalid key while valid key path is healthy)
rotation recovery (rotated key succeeds again)
potential platform/network outage (valid path not healthy)

Canonical consumer contract:

docs/CREDENTIAL_ROTATION_CONTRACT.md (spool-auth-rotation-v1)

License: AGPLv3
Learn more: https://helpifyr.com

Documentation Map​

Operations

Start / Run / Deploy​

Health and Readiness​

Monitoring​

Logs and Diagnostics​

Standard Restart Order​

Compose Core Stack Recovery​

Known Failure Modes​

Typical 502 Cases​

Typical TLS Cases​

Typical n8n Cases​

Runtime Dependencies​

Weak Host Mode​

Credential Drift Verify (Spool)​