Troubleshooting
Documentation Map
-
Troubleshooting
-
Channel:
stable -
Source repo:
JaddaHelpifyr/jhf-bobbin
Troubleshooting
Run This Check
Fast repo-local checks:
python3 scripts/check_docs_contract.py
python3 scripts/check_module_features_contract.py
python3 scripts/check_runtime_materialization_drift.py
python3 scripts/check_live_runtime_contract.py
Host/runtime checks:
bash scripts/plugin_smoke_test.sh
bash scripts/probe_live_host_readonly.sh
LocalAI starts but downloads too much
The tested aio-cpu image is convenient but heavy.
For leaner production packaging, swap the image later, but keep the OpenAI-compatible embedding endpoint contract stable.
Mem0 recall/capture returns Bad Request
Check in this order:
- LocalAI
/v1/embeddingsreturns384values - Qdrant collection uses
384+Cosine - The LocalAI compatibility patch is present in
vendor/mem0-oss.mjs - Qdrant collection is plain, not native-tenant
- Qdrant client in the fork is new enough for your server version
OpenAI SDK embeddings from LocalAI look wrong
In the validated stack, raw HTTP to LocalAI returned correct 384-d vectors while the SDK path returned an invalid 96-value zero vector. That is why this kit patches the fork to use raw HTTP for LocalAI embeddings.
AIO LocalAI returns 500 for /v1/embeddings
Symptom:
POST /v1/embeddingsfails with500- LocalAI logs mention a broken model path under
/models/huggingface:/... text-embedding-ada-002appears present, but embeddings still fail
Cause:
- the stock
aio-cpuimage can fall back to an internalhuggingface://...embedding reference - that fallback can drift away from the working
all-MiniLM-L6-v2definition used by this kit
Fix in this repository:
- force
MODELSto include/models/text-embedding-ada-002.yaml - bind-mount the known-good
localai-model.text-embedding-ada-002.yamlto/models/text-embedding-ada-002.yaml - restart the LocalAI container
Verify:
curl -s http://HOST:8088/v1/models
curl -s http://HOST:8088/v1/embeddings \
-H 'Content-Type: application/json' \
-d '{"model":"text-embedding-ada-002","input":"semantic memory check"}'
Expected:
/v1/modelsliststext-embedding-ada-002/v1/embeddingsreturns a vector with384values
Operator guidance:
- for production stability,
slimremains the safer LocalAI profile aiois fine when you want convenience, but only with the explicit model override now built into this kit
LocalAI says backend not found: transformers or sentencetransformers
Symptom:
/v1/embeddingsfails even though the model file exists- LocalAI logs mention
backend not found: transformersorbackend not found: sentencetransformers - this often appears on
latest-aio-cpuwhen operators expectall-MiniLM-L6-v2to work through the sentence-transformer backend
Cause:
- the runtime image does not provide the sentence-transformer backend for that deployment
- the previous fallback path expected a GGUF artifact that was not guaranteed to exist
Deterministic fix path:
- keep the canonical YAML contract:
- localai-model.text-embedding-ada-002.yaml (
C:/CodexTest/jhf-bobbin/configs/localai-model.text-embedding-ada-002.yaml)
- localai-model.text-embedding-ada-002.yaml (
- ensure the canonical GGUF artifact exists:
all-minilm-l6-v2_f16.gguf
- ensure installer vars are set:
LOCALAI_EMBEDDING_MODEL_FILE=all-minilm-l6-v2_f16.ggufLOCALAI_EMBEDDING_MODEL_URL=https://huggingface.co/LLukas22/all-MiniLM-L6-v2-GGUF/resolve/main/all-minilm-l6-v2_f16.gguf
- recreate LocalAI and verify
POST /v1/embeddingsreturns length384
Why this matters:
- this path removes backend ambiguity and artifact-missing drift from the critical embedding runtime
- the smoke path now fails closed when embeddings are not functional
AIO LocalAI keeps downloading unrelated models and Mem0 stays down too long
Symptom:
jhf-bobbin-localairemains inhealth: starting- logs show downloads for speech, image, vision, or other side models
- OpenClaw logs show transient
fetch failedduring that time
Fix:
- reduce
MODELSto/models/text-embedding-ada-002.yamlduring recovery - let the stack expose only the embedding path needed by Mem0 first
- reintroduce extra AIO models later only if you actually need them
Validated effect:
- this removed the long startup drag on the live host and allowed Mem0 recall/capture to come back promptly after recreate
OpenClaw update breaks semantic memory
Run:
bash scripts/reapply_after_openclaw_update.sh
If still broken, roll back to memory-core:
python3 scripts/activate_memory_core.py
cd /root/openclaw
docker compose up -d --force-recreate openclaw-gateway
apply_patch warning still appears
That warning is unrelated to the semantic-memory stack. It comes from the OpenClaw runtime/profile tool allowlist and does not by itself mean Mem0 is broken.
Qdrant external host vs local host
If you already have a Qdrant server elsewhere:
- set
USE_LOCAL_QDRANT=0 - set
QDRANT_URL=http://HOST:6333 - rerun the installer
Mistral 7B embeddings feel worse than MiniLM
That is consistent with the live validation behind this kit.
mistral:7b-instruct-v0.3-q4_K_M can generate embeddings technically, but the retrieval quality for Mem0-style operational recall was usually worse than all-MiniLM-L6-v2.
Recommendation:
- keep
all-MiniLM-L6-v2/text-embedding-ada-002as the primary Mem0 embedding path - use local
Mistral 7Bfor compaction, heartbeat, watchdog, and log-triage tasks instead
Want to disable Mem0 but keep the stack ready
Run:
python3 scripts/activate_memory_core.py
cd /root/openclaw
docker compose up -d --force-recreate openclaw-gateway
LocalAI and Qdrant can remain running for later re-enable.
Agents hit session file locked and suddenly switch provider/model
Symptom:
- logs show
session file locked (timeout 10000ms) - the same agent then shows model fallback decisions such as
deepseek/deepseek-chat -> kimi -> ... cicd-ops,main, orhocksieare affected most often
Important diagnosis:
- this is usually not a Mem0 recall bug
- the provider switch is often a downstream effect of the lock/timeout
- in the validated live stack, the biggest trigger was the older agent-native
post-compaction-recoverycron jobs before they were converted to the lightweight file-only variant
Why this happens:
- those recovery jobs run with isolated cron session keys
- but they still inspect the agent's recent state
- that can contend with the hot interactive
agent:*:mainsession for the same agent - once the session path stalls, the model request times out and OpenClaw falls through its fallback chain
- a separate validated failure mode was using host-only paths like
/root/openclaw/...inside the gateway container; the lightweight jobs must read the mounted container paths under/home/node/.openclaw/...
Preferred fix in this kit:
- convert the older recovery jobs to the lightweight
local-mistral/ file-only form - disable the overlapping watchdog jobs for the same three workspaces so only one lightweight maintenance path remains
Helper:
bash scripts/migrate_recovery_crons_to_lightweight.sh
The helper converts these recovery jobs:
d8201a2f-f816-42bc-acb0-287df0a0050ee9dc7e0c-bbbf-4108-af72-245d3a52c23c1ca99d11-caa8-4e84-bfaf-8cb3707ad505
Verify:
openclaw cron list --json
timeout 20s docker logs --since 30m --tail 400 openclaw-gateway 2>&1 | grep -E 'session file locked|model fallback decision'
Expected:
- the three recovery jobs above show
agentId: local-mistral - their session keys start with
agent:local-mistral:post-compaction-recovery- - the older overlapping watchdog jobs are disabled
- lock spikes and surprise provider switches drop sharply
Log-read safety note:
- on live hosts, use bounded log probes only (
timeout + --since + --tail) - avoid unbounded
docker logscalls in operational verification paths
Repeated LocalAI timeout bursts and recreate churn
Symptom:
- frequent timeout/grpc-style errors around LocalAI probes
- repeated recreate attempts increase host CPU pressure
Guarded recovery in this repo:
- use
scripts/localai_probe_guard.pyfor/readyzand/v1/modelsprobes - allow the guard to enter temporary degraded mode when timeout bursts cross threshold
- do not run tight recreate loops while degraded mode is active
Useful checks:
python3 scripts/localai_probe_guard.py --endpoint models --base-url http://<internal-runtime-redacted>:8088/v1
cat /tmp/jhf-bobbin-localai-guard.prom
Tuning knobs (installer env):
LOCALAI_PROBE_DEGRADE_THRESHOLDLOCALAI_PROBE_TIMEOUT_WINDOW_SECONDSLOCALAI_PROBE_DEGRADE_COOLDOWN_SECONDSLOCALAI_MODELS_MIN_INTERVAL_SECONDS
AGPLv3. See ../LICENSE (LICENSE).
Learn more at helpifyr.com.