Operations
Documentation Map
-
Operations
-
Channel:
stable -
Source repo:
JaddaHelpifyr/helpifyr-fabric
Operations
Tool / Contract Summary
This page documents how the repository is deployed, verified, restarted, and observed. It is the runtime companion to the contract and API documentation.
Business Value
- gives operators a deterministic run path for local, CI, and live-host verification
- keeps runtime evidence aligned with Fabric contract truth
- prevents host or container mutation from happening without an auditable verify path
Current Verified State
- reference live host:
<internal-runtime-redacted> - host verification posture: read-first over SSH before assuming runtime failure
- main stack and platform-plane compose files live in
deploy/compose/ - runtime evidence can be cross-checked through
/api/v1/tools/host-snapshot,/api/v1/tools/runtime-status,/api/v1/tools/runtime-evidence,/api/v1/tools/runtime-contracts, and/api/v1/tools/runtime-observations
Available Now
Runtime paths
- main stack:
deploy/compose/jhf-fabric.stack.yml - low-CPU main stack:
deploy/compose/jhf-fabric.stack.low-cpu.yml - platform plane:
deploy/compose/jhf-fabric.platform-plane.yml - low-CPU platform plane:
deploy/compose/jhf-fabric.platform-plane.low-cpu.yml - ephemeral suite:
deploy/compose/docker-compose.test.yml - low-CPU ephemeral suite:
deploy/compose/docker-compose.test.low-cpu.yml
Operational scripts
scripts/resolve-runtime-env.shscripts/build-runtime-tool-env.shscripts/ensure-fabric-docker-resources.shscripts/redeploy-host-stack.shscripts/redeploy-platform-plane.shscripts/prepare-platform-plane-assets.shscripts/test-up.shscripts/test-run.shscripts/test-down.shscripts/bootstrap_wikijs_docs.pyscripts/safe_docker_logs.shscripts/post-deploy-guardrails.shscripts/verify-runtime-guardrails.shscripts/verify_runtime_materialization.py
Wiki.js and platform-plane assets
deploy/compose/platform-plane/wiki/HELPIFYR_WIKI_HOME.mddeploy/compose/platform-plane/wiki/favicon.svgdeploy/compose/platform-plane/wiki/jadda_helpifyr_logo.svgdeploy/compose/platform-plane/wiki/helpifyr-wiki-theme.cssdocs/operations/WIKIJS_PLATFORM_PLANE.md
Optional / Extended
- low-CPU deployment variants
- platform-plane services such as Wiki.js, Prometheus, Grafana, and OpenTelemetry Collector
- optional consumers such as internal docs portals and downstream runtime dashboards
Planned / Not In Current Scope
- any host mutation path that is not represented by a real script or guarded preview surface
- undocumented write flows against providers or downstream tools
Public Surfaces
Operator runtime and evidence routes:
GET /healthGET /api/v1/platform/servicesGET /api/v1/tools/host-snapshotGET /api/v1/tools/runtime-statusGET /api/v1/tools/runtime-evidenceGET /api/v1/tools/runtime-contractsGET /api/v1/tools/runtime-observationsGET /api/v1/observability/readinessGET /api/v1/security/readinessGET /api/v1/recovery/readinessGET /api/v1/signoff/readiness
Contract Families
Operations interact directly with:
- runtime port contracts
- provider instance registry
- drift reports
- docs and wiki governance contracts
- shared topology and shared service baseline contracts
Producer / Consumer Zuordnung
- producer: Fabric publishes runtime observations and contract-shaped operational evidence
- consumer: operators, CI, Wiki.js, and downstream repos
- boundary rule: host observations are consumed into Fabric, but they do not override contract truth
Compatibility Window
- live-host posture is Linux and POSIX/bash first
- compose and redeploy scripts are the canonical operational interface
- direct ad-hoc mutation is not considered a compatible operator path
Lifecycle Status
- active deployment and verification posture
- host and platform-plane scripts are maintained alongside the API and contract layers
Readiness / Drift / Monitoring
Recommended health and readiness order:
GET /healthGET /api/v1/platform/servicesGET /api/v1/observability/readinessGET /api/v1/security/readinessGET /api/v1/recovery/readinessGET /api/v1/signoff/readiness- subsystem-specific readiness for persistence, Dapr, events, tooling, identity, or providers as needed
Monitoring stack:
- Prometheus for metrics collection
- Grafana for dashboards
- OpenTelemetry Collector for telemetry export
/api/v1/monitoring/metricsfor tool and policy metrics
Deployment / Verify
Validate compose
bash ./scripts/resolve-runtime-env.sh /tmp/jhf-fabric-resolved.envdocker compose --env-file /tmp/jhf-fabric-resolved.env -f deploy/compose/jhf-fabric.stack.yml configdocker compose -f deploy/compose/jhf-fabric.platform-plane.yml configdocker compose --env-file deploy/compose/platform-plane/wiki/.env -f deploy/compose/jhf-fabric.platform-plane.yml configdocker compose -f deploy/compose/docker-compose.test.yml config
Deploy or redeploy
bash ./scripts/redeploy-host-stack.shbash ./scripts/redeploy-platform-plane.shbash ./scripts/prepare-platform-plane-assets.shbash ./scripts/ensure-fabric-docker-resources.shbash ./scripts/verify-runtime-guardrails.shpython ./scripts/verify_runtime_materialization.py --checkpython ./scripts/verify_runtime_materialization.py --check --live-via-ssh <internal-runtime-redacted><internal-runtime-redacted>
Local API start
uvicorn helpifyr_fabric.api.app:app --reload
Ephemeral verification
bash ./scripts/test-up.shbash ./scripts/test-run.shbash ./scripts/test-down.sh
Docs bootstrap
python scripts/bootstrap_wikijs_docs.py --wiki-url http://<internal-runtime-redacted>:33001 --site-host https://docs.helpifyr.com- add
--dry-runto preview output without writing pages python scripts/docs/materialize_public_docs_site.pynpm run build --prefix docs-sitenpm run deploy:cloudflare --prefix docs-site
Wiki.js bootstrap remains internal/operator-only. The public docs.helpifyr.com surface is the Docusaurus site materialized from Fabric docs-platform truth.
Known Limits
- direct host/port verification remains canonical when a service has not yet published its own hostname contract
- platform-plane components are optional and may not be present on every installation
- runtime observations can be delayed or degraded when provider dependencies are unavailable
Exceptions / Waivers
- legacy
*.<internal-runtime-redacted>hostnames are redirect surfaces only and must not be treated as primary Fabric-owned truth - some runtime probes are intentionally lightweight and policy-limited to avoid pressure on host services
Logs
For live-host log inspection, use the bounded snapshot policy in operations/HOST_DOCKER_LOG_GUARDRAILS.md (docs/operations/HOST_DOCKER_LOG_GUARDRAILS.md).
Main stack
timeout 15s docker compose -f deploy/compose/jhf-fabric.stack.yml logs --since 10m --tail 80 apitimeout 15s docker compose -f deploy/compose/jhf-fabric.stack.yml logs --since 10m --tail 80 daprdtimeout 15s docker compose -f deploy/compose/jhf-fabric.stack.yml logs --since 10m --tail 80 postgrestimeout 15s docker compose -f deploy/compose/jhf-fabric.stack.yml logs --since 10m --tail 80 natsbash ./scripts/post-deploy-guardrails.sh jhf-fabric
Platform plane
timeout 15s docker compose -f deploy/compose/jhf-fabric.platform-plane.yml logs --since 10m --tail 80 apitimeout 15s docker compose -f deploy/compose/jhf-fabric.platform-plane.yml logs --since 10m --tail 80 prometheustimeout 15s docker compose -f deploy/compose/jhf-fabric.platform-plane.yml logs --since 10m --tail 80 grafanatimeout 15s docker compose --env-file deploy/compose/platform-plane/wiki/.env -f deploy/compose/jhf-fabric.platform-plane.yml logs --since 10m --tail 80 wikijstimeout 15s docker compose -f deploy/compose/jhf-fabric.platform-plane.yml logs --since 10m --tail 80 otel-collectorbash ./scripts/post-deploy-guardrails.sh jhf-fabric
CPU-Safe Runtime Guardrails
- Standard verify and redeploy paths must stay bounded and low-pressure.
- Use
bash ./scripts/verify-runtime-guardrails.shbefore release-oriented changes to confirm repo-owned stack truth, bounded diagnostics, and low-pressure defaults. - Use
python ./scripts/verify_runtime_materialization.py --check --live-via-ssh <internal-runtime-redacted><internal-runtime-redacted>when runtime/config changes are involved to prove repo truth, active compose labels, container env/mounts/networks, and app readback stayed aligned. - Post-deploy cleanup is mandatory through
bash ./scripts/post-deploy-guardrails.sh jhf-fabric; the script fails closed if stale repo-owneddocker logs,docker compose ... logs, ordocker execdiagnostics remain beyond the configured minimum age.
Typical Failure Modes
- Dapr sidecar unavailable
- PostgreSQL bootstrap or migration incomplete
- NATS or event publication readiness drift
- stale repository manifests or tool profiles
- Grafana active-dashboard provisioning drift
- provider runtime evidence delay or DNS/network mismatch on the host
Diagnosis Order
- validate compose config
- inspect
docker compose ... ps - inspect bounded Fabric API log snapshots
- inspect bounded dependency service log snapshots
- run readiness endpoints in the order above
- use subsystem runbooks before mutating state
Restart And Recovery
- restart only the affected Fabric-owned service when possible
- prefer additive rebuild or restart over manual state edits
- use STACK_RECOVERY_RUNBOOK (
docs/operations/STACK_RECOVERY_RUNBOOK.md) before direct persistence mutation - use recovery and signoff readiness surfaces to confirm post-restart state
Runtime Dependencies
- PostgreSQL
- NATS JetStream
- Dapr sidecar
- Gitea for repository contract intake
- optional platform-plane observability services
- optional internal docs consumer through Wiki.js
Related Issues
- operational history and remaining backlog items live under
docs/issues/anddocs/AUTONOMOUS_BACKLOG.md
License
- License: AGPLv3
- Project: https://helpifyr.com