Skip to main content

Operations

Documentation Map

Operations

Tool / Contract Summary

This page documents how the repository is deployed, verified, restarted, and observed. It is the runtime companion to the contract and API documentation.

Business Value

  • gives operators a deterministic run path for local, CI, and live-host verification
  • keeps runtime evidence aligned with Fabric contract truth
  • prevents host or container mutation from happening without an auditable verify path

Current Verified State

  • reference live host: <internal-runtime-redacted>
  • host verification posture: read-first over SSH before assuming runtime failure
  • main stack and platform-plane compose files live in deploy/compose/
  • runtime evidence can be cross-checked through /api/v1/tools/host-snapshot, /api/v1/tools/runtime-status, /api/v1/tools/runtime-evidence, /api/v1/tools/runtime-contracts, and /api/v1/tools/runtime-observations

Available Now

Runtime paths

  • main stack: deploy/compose/jhf-fabric.stack.yml
  • low-CPU main stack: deploy/compose/jhf-fabric.stack.low-cpu.yml
  • platform plane: deploy/compose/jhf-fabric.platform-plane.yml
  • low-CPU platform plane: deploy/compose/jhf-fabric.platform-plane.low-cpu.yml
  • ephemeral suite: deploy/compose/docker-compose.test.yml
  • low-CPU ephemeral suite: deploy/compose/docker-compose.test.low-cpu.yml

Operational scripts

  • scripts/resolve-runtime-env.sh
  • scripts/build-runtime-tool-env.sh
  • scripts/ensure-fabric-docker-resources.sh
  • scripts/redeploy-host-stack.sh
  • scripts/redeploy-platform-plane.sh
  • scripts/prepare-platform-plane-assets.sh
  • scripts/test-up.sh
  • scripts/test-run.sh
  • scripts/test-down.sh
  • scripts/bootstrap_wikijs_docs.py
  • scripts/safe_docker_logs.sh
  • scripts/post-deploy-guardrails.sh
  • scripts/verify-runtime-guardrails.sh
  • scripts/verify_runtime_materialization.py

Wiki.js and platform-plane assets

  • deploy/compose/platform-plane/wiki/HELPIFYR_WIKI_HOME.md
  • deploy/compose/platform-plane/wiki/favicon.svg
  • deploy/compose/platform-plane/wiki/jadda_helpifyr_logo.svg
  • deploy/compose/platform-plane/wiki/helpifyr-wiki-theme.css
  • docs/operations/WIKIJS_PLATFORM_PLANE.md

Optional / Extended

  • low-CPU deployment variants
  • platform-plane services such as Wiki.js, Prometheus, Grafana, and OpenTelemetry Collector
  • optional consumers such as internal docs portals and downstream runtime dashboards

Planned / Not In Current Scope

  • any host mutation path that is not represented by a real script or guarded preview surface
  • undocumented write flows against providers or downstream tools

Public Surfaces

Operator runtime and evidence routes:

  • GET /health
  • GET /api/v1/platform/services
  • GET /api/v1/tools/host-snapshot
  • GET /api/v1/tools/runtime-status
  • GET /api/v1/tools/runtime-evidence
  • GET /api/v1/tools/runtime-contracts
  • GET /api/v1/tools/runtime-observations
  • GET /api/v1/observability/readiness
  • GET /api/v1/security/readiness
  • GET /api/v1/recovery/readiness
  • GET /api/v1/signoff/readiness

Contract Families

Operations interact directly with:

  • runtime port contracts
  • provider instance registry
  • drift reports
  • docs and wiki governance contracts
  • shared topology and shared service baseline contracts

Producer / Consumer Zuordnung

  • producer: Fabric publishes runtime observations and contract-shaped operational evidence
  • consumer: operators, CI, Wiki.js, and downstream repos
  • boundary rule: host observations are consumed into Fabric, but they do not override contract truth

Compatibility Window

  • live-host posture is Linux and POSIX/bash first
  • compose and redeploy scripts are the canonical operational interface
  • direct ad-hoc mutation is not considered a compatible operator path

Lifecycle Status

  • active deployment and verification posture
  • host and platform-plane scripts are maintained alongside the API and contract layers

Readiness / Drift / Monitoring

Recommended health and readiness order:

  1. GET /health
  2. GET /api/v1/platform/services
  3. GET /api/v1/observability/readiness
  4. GET /api/v1/security/readiness
  5. GET /api/v1/recovery/readiness
  6. GET /api/v1/signoff/readiness
  7. subsystem-specific readiness for persistence, Dapr, events, tooling, identity, or providers as needed

Monitoring stack:

  • Prometheus for metrics collection
  • Grafana for dashboards
  • OpenTelemetry Collector for telemetry export
  • /api/v1/monitoring/metrics for tool and policy metrics

Deployment / Verify

Validate compose

  • bash ./scripts/resolve-runtime-env.sh /tmp/jhf-fabric-resolved.env
  • docker compose --env-file /tmp/jhf-fabric-resolved.env -f deploy/compose/jhf-fabric.stack.yml config
  • docker compose -f deploy/compose/jhf-fabric.platform-plane.yml config
  • docker compose --env-file deploy/compose/platform-plane/wiki/.env -f deploy/compose/jhf-fabric.platform-plane.yml config
  • docker compose -f deploy/compose/docker-compose.test.yml config

Deploy or redeploy

  • bash ./scripts/redeploy-host-stack.sh
  • bash ./scripts/redeploy-platform-plane.sh
  • bash ./scripts/prepare-platform-plane-assets.sh
  • bash ./scripts/ensure-fabric-docker-resources.sh
  • bash ./scripts/verify-runtime-guardrails.sh
  • python ./scripts/verify_runtime_materialization.py --check
  • python ./scripts/verify_runtime_materialization.py --check --live-via-ssh <internal-runtime-redacted><internal-runtime-redacted>

Local API start

  • uvicorn helpifyr_fabric.api.app:app --reload

Ephemeral verification

  • bash ./scripts/test-up.sh
  • bash ./scripts/test-run.sh
  • bash ./scripts/test-down.sh

Docs bootstrap

  • python scripts/bootstrap_wikijs_docs.py --wiki-url http://<internal-runtime-redacted>:33001 --site-host https://docs.helpifyr.com
  • add --dry-run to preview output without writing pages
  • python scripts/docs/materialize_public_docs_site.py
  • npm run build --prefix docs-site
  • npm run deploy:cloudflare --prefix docs-site

Wiki.js bootstrap remains internal/operator-only. The public docs.helpifyr.com surface is the Docusaurus site materialized from Fabric docs-platform truth.

Known Limits

  • direct host/port verification remains canonical when a service has not yet published its own hostname contract
  • platform-plane components are optional and may not be present on every installation
  • runtime observations can be delayed or degraded when provider dependencies are unavailable

Exceptions / Waivers

  • legacy *.<internal-runtime-redacted> hostnames are redirect surfaces only and must not be treated as primary Fabric-owned truth
  • some runtime probes are intentionally lightweight and policy-limited to avoid pressure on host services

Logs

For live-host log inspection, use the bounded snapshot policy in operations/HOST_DOCKER_LOG_GUARDRAILS.md (docs/operations/HOST_DOCKER_LOG_GUARDRAILS.md).

Main stack

  • timeout 15s docker compose -f deploy/compose/jhf-fabric.stack.yml logs --since 10m --tail 80 api
  • timeout 15s docker compose -f deploy/compose/jhf-fabric.stack.yml logs --since 10m --tail 80 daprd
  • timeout 15s docker compose -f deploy/compose/jhf-fabric.stack.yml logs --since 10m --tail 80 postgres
  • timeout 15s docker compose -f deploy/compose/jhf-fabric.stack.yml logs --since 10m --tail 80 nats
  • bash ./scripts/post-deploy-guardrails.sh jhf-fabric

Platform plane

  • timeout 15s docker compose -f deploy/compose/jhf-fabric.platform-plane.yml logs --since 10m --tail 80 api
  • timeout 15s docker compose -f deploy/compose/jhf-fabric.platform-plane.yml logs --since 10m --tail 80 prometheus
  • timeout 15s docker compose -f deploy/compose/jhf-fabric.platform-plane.yml logs --since 10m --tail 80 grafana
  • timeout 15s docker compose --env-file deploy/compose/platform-plane/wiki/.env -f deploy/compose/jhf-fabric.platform-plane.yml logs --since 10m --tail 80 wikijs
  • timeout 15s docker compose -f deploy/compose/jhf-fabric.platform-plane.yml logs --since 10m --tail 80 otel-collector
  • bash ./scripts/post-deploy-guardrails.sh jhf-fabric

CPU-Safe Runtime Guardrails

  • Standard verify and redeploy paths must stay bounded and low-pressure.
  • Use bash ./scripts/verify-runtime-guardrails.sh before release-oriented changes to confirm repo-owned stack truth, bounded diagnostics, and low-pressure defaults.
  • Use python ./scripts/verify_runtime_materialization.py --check --live-via-ssh <internal-runtime-redacted><internal-runtime-redacted> when runtime/config changes are involved to prove repo truth, active compose labels, container env/mounts/networks, and app readback stayed aligned.
  • Post-deploy cleanup is mandatory through bash ./scripts/post-deploy-guardrails.sh jhf-fabric; the script fails closed if stale repo-owned docker logs, docker compose ... logs, or docker exec diagnostics remain beyond the configured minimum age.

Typical Failure Modes

  • Dapr sidecar unavailable
  • PostgreSQL bootstrap or migration incomplete
  • NATS or event publication readiness drift
  • stale repository manifests or tool profiles
  • Grafana active-dashboard provisioning drift
  • provider runtime evidence delay or DNS/network mismatch on the host

Diagnosis Order

  1. validate compose config
  2. inspect docker compose ... ps
  3. inspect bounded Fabric API log snapshots
  4. inspect bounded dependency service log snapshots
  5. run readiness endpoints in the order above
  6. use subsystem runbooks before mutating state

Restart And Recovery

  • restart only the affected Fabric-owned service when possible
  • prefer additive rebuild or restart over manual state edits
  • use STACK_RECOVERY_RUNBOOK (docs/operations/STACK_RECOVERY_RUNBOOK.md) before direct persistence mutation
  • use recovery and signoff readiness surfaces to confirm post-restart state

Runtime Dependencies

  • PostgreSQL
  • NATS JetStream
  • Dapr sidecar
  • Gitea for repository contract intake
  • optional platform-plane observability services
  • optional internal docs consumer through Wiki.js
  • operational history and remaining backlog items live under docs/issues/ and docs/AUTONOMOUS_BACKLOG.md

License