Zum Hauptinhalt springen

Operate and Monitor

Use this page when the stack is already running and you need a daily operator path for health, readiness, drift visibility, and bounded recovery decisions.

When to use this page

  • You need the standard health and readiness readback loop.
  • You need to monitor whether repo truth, runtime truth, and user-visible behavior still align.
  • You need to decide whether to observe, escalate, restart, or recover.

Prerequisites

  • You can access the Fabric health and readiness surfaces.
  • You know whether you are verifying local repo truth, deployed runtime truth, or both.
  • You can use bounded repo-owned verification commands before host mutation.

Operating model

Healthy operations on Helpifyr are read-first and compare multiple layers:

  • docs truth
  • contract truth
  • runtime health
  • observability readiness
  • security readiness
  • recovery posture

Architecture / Flow

Step-by-step procedure

1. Start with the standard readback sequence

Use the same order documented in the Fabric operations lane:

GET /health
GET /api/v1/platform/services
GET /api/v1/observability/readiness
GET /api/v1/security/readiness
GET /api/v1/recovery/readiness
GET /api/v1/signoff/readiness

This separates:

  • coarse service health
  • subsystem readiness
  • security posture
  • recovery posture
  • signoff readiness

2. Check runtime evidence when signals disagree

If one surface looks stale or contradictory, use:

python ./scripts/verify_runtime_materialization.py --check

Use it when:

  • docs and runtime disagree
  • readiness looks worse than health
  • a service exists but its deployment shape is suspect

3. Keep bounded guardrails before mutation

Before restart or redeploy work, use:

bash ./scripts/verify-runtime-guardrails.sh

This keeps diagnostics and operator behavior inside the repo-owned low-pressure safety lane.

4. Use the matching operations runbook

Typical next choices:

5. Verify behavior, not just endpoints

After any recovery or operator action:

  • repeat the same readiness sequence
  • repeat the materialization check when relevant
  • confirm one representative user, workflow, or consumer path behaves correctly again

Example operator loop

curl -s <fabric-base-url>/health
curl -s <fabric-base-url>/api/v1/platform/services
curl -s <fabric-base-url>/api/v1/observability/readiness
curl -s <fabric-base-url>/api/v1/security/readiness
curl -s <fabric-base-url>/api/v1/recovery/readiness
curl -s <fabric-base-url>/api/v1/signoff/readiness
python ./scripts/verify_runtime_materialization.py --check
bash ./scripts/verify-runtime-guardrails.sh

Verification

Operations posture is considered healthy enough to continue when:

  1. the standard health and readiness sequence is readable
  2. the signals do not contradict each other in an unresolved way
  3. repo/runtime drift is absent or bounded
  4. no required owner runbook escalation remains open

Common failure modes

Monitoring only /health

Problem:

  • you miss deeper readiness and security regressions.

Better path:

  • include observability, security, recovery, and signoff readiness

Recovering without comparing to repo truth

Problem:

  • a runtime drift issue is treated as a generic outage.

Better path:

  • run python ./scripts/verify_runtime_materialization.py --check

Turning every red signal into a restart

Problem:

  • operator actions become noisy and non-diagnostic.

Better path:

  • pick the narrowest matching runbook first

Owner Handoff

  • operations truth owner: JaddaHelpifyr/helpifyr-fabric
  • environment and deployment recovery support: JaddaHelpifyr/jhf-openclaw-env, JaddaHelpifyr/jhf-deployment

Source Truth

Next paths