Operate and Monitor

Use this page when the stack is already running and you need a daily operator path for health, readiness, drift visibility, and bounded recovery decisions.

When to use this page

You need the standard health and readiness readback loop.
You need to monitor whether repo truth, runtime truth, and user-visible behavior still align.
You need to decide whether to observe, escalate, restart, or recover.

Prerequisites

You can access the Fabric health and readiness surfaces.
You know whether you are verifying local repo truth, deployed runtime truth, or both.
You can use bounded repo-owned verification commands before host mutation.

Operating model

Healthy operations on Helpifyr are read-first and compare multiple layers:

docs truth
contract truth
runtime health
observability readiness
security readiness
recovery posture

Architecture / Flow

Step-by-step procedure

1. Start with the standard readback sequence

Use the same order documented in the Fabric operations lane:

GET /health
GET /api/v1/platform/services
GET /api/v1/observability/readiness
GET /api/v1/security/readiness
GET /api/v1/recovery/readiness
GET /api/v1/signoff/readiness

This separates:

coarse service health
subsystem readiness
security posture
recovery posture
signoff readiness

2. Check runtime evidence when signals disagree

If one surface looks stale or contradictory, use:

python ./scripts/verify_runtime_materialization.py --check

Use it when:

docs and runtime disagree
readiness looks worse than health
a service exists but its deployment shape is suspect

3. Keep bounded guardrails before mutation

Before restart or redeploy work, use:

bash ./scripts/verify-runtime-guardrails.sh

This keeps diagnostics and operator behavior inside the repo-owned low-pressure safety lane.

4. Use the matching operations runbook

Typical next choices:

5. Verify behavior, not just endpoints

After any recovery or operator action:

repeat the same readiness sequence
repeat the materialization check when relevant
confirm one representative user, workflow, or consumer path behaves correctly again

Example operator loop

curl -s <fabric-base-url>/health
curl -s <fabric-base-url>/api/v1/platform/services
curl -s <fabric-base-url>/api/v1/observability/readiness
curl -s <fabric-base-url>/api/v1/security/readiness
curl -s <fabric-base-url>/api/v1/recovery/readiness
curl -s <fabric-base-url>/api/v1/signoff/readiness
python ./scripts/verify_runtime_materialization.py --check
bash ./scripts/verify-runtime-guardrails.sh

Verification

Operations posture is considered healthy enough to continue when:

the standard health and readiness sequence is readable
the signals do not contradict each other in an unresolved way
repo/runtime drift is absent or bounded
no required owner runbook escalation remains open

Common failure modes

Monitoring only `/health`

Problem:

you miss deeper readiness and security regressions.

Better path:

include observability, security, recovery, and signoff readiness

Recovering without comparing to repo truth

Problem:

a runtime drift issue is treated as a generic outage.

Better path:

run python ./scripts/verify_runtime_materialization.py --check

Turning every red signal into a restart

Problem:

operator actions become noisy and non-diagnostic.

Better path:

pick the narrowest matching runbook first

Owner Handoff

operations truth owner: JaddaHelpifyr/helpifyr-fabric
environment and deployment recovery support: JaddaHelpifyr/jhf-openclaw-env, JaddaHelpifyr/jhf-deployment

Operate and Monitor

When to use this page

Prerequisites

Operating model

Architecture / Flow

Step-by-step procedure

1. Start with the standard readback sequence

2. Check runtime evidence when signals disagree

3. Keep bounded guardrails before mutation

4. Use the matching operations runbook

5. Verify behavior, not just endpoints

Example operator loop

Verification

Common failure modes

Monitoring only `/health`

Recovering without comparing to repo truth

Turning every red signal into a restart

Owner Handoff

Source Truth

Next paths

When to use this page​

Prerequisites​

Operating model​

Architecture / Flow​

Step-by-step procedure​

1. Start with the standard readback sequence​

2. Check runtime evidence when signals disagree​

3. Keep bounded guardrails before mutation​

4. Use the matching operations runbook​

5. Verify behavior, not just endpoints​

Example operator loop​

Verification​

Common failure modes​

Monitoring only /health​

Recovering without comparing to repo truth​

Turning every red signal into a restart​

Owner Handoff​

Source Truth​

Next paths​

When to use this page

Prerequisites

Operating model

Architecture / Flow

Step-by-step procedure

1. Start with the standard readback sequence

2. Check runtime evidence when signals disagree

3. Keep bounded guardrails before mutation

4. Use the matching operations runbook

5. Verify behavior, not just endpoints

Example operator loop

Verification

Common failure modes

Monitoring only `/health`

Recovering without comparing to repo truth

Turning every red signal into a restart

Owner Handoff

Source Truth

Next paths