Aller au contenu principal

Check stack health

Use this page when the stack looks degraded, uncertain, or contradictory and you need a clean health and readiness baseline before doing anything riskier.

When to use this page

  • A user-visible path is failing and you need the fastest trusted readback.
  • Dashboards, docs, or runtime signals disagree.
  • You need to decide whether to observe, troubleshoot further, restart, or escalate.

Prerequisites

  • You can access the Fabric health and readiness surfaces.
  • You can capture timestamped evidence before mutation.

Architecture / Flow

Step-by-step procedure

1. Capture the coarse health signal first

Start with:

GET /health

This tells you whether the stack is broadly alive, but it is not enough by itself.

2. Read subsystem-aware readiness

Continue with:

GET /api/v1/platform/services
GET /api/v1/observability/readiness
GET /api/v1/security/readiness

These help separate:

  • coarse service health
  • subsystem readiness
  • security posture issues

3. Check repo truth versus running truth when signals disagree

If readiness looks stale or contradictory, run:

python ./scripts/verify_runtime_materialization.py --check

4. Record evidence before choosing the next action

Capture:

  • one timestamped pass of the health and readiness responses
  • the first concrete user-visible symptom
  • any dashboard or surface that disagrees with the API-level readback

5. Choose the narrowest next runbook

Typical follow-ups:

Verification

This runbook is being used correctly when:

  1. health and readiness were captured before mutation
  2. the next action is chosen from evidence, not from habit
  3. a follow-up runbook is narrower than “restart everything”

Common failure modes

Stopping after /health

Problem:

  • the stack looks alive, but the real readiness failure stays hidden.

Better path:

  • include subsystem and security readiness in the first pass

Restarting before comparing repo and runtime truth

Problem:

  • drift and materialization issues are treated like generic outages.

Better path:

  • use verify_runtime_materialization.py --check when signals disagree

Source Truth

  • contracts/docs/public_manufacturer_docs_post_v1_runbooks_v1.json
  • JaddaHelpifyr/helpifyr-fabric:docs/operations/HOST_STACK_VERIFICATION.md
  • JaddaHelpifyr/helpifyr-fabric:docs/operations/OBSERVABILITY_BASELINE_RUNBOOK.md
  • JaddaHelpifyr/jhf-lantern:docs/OVERVIEW.md

Next paths