Troubleshooting

Documentation Map

Overview
Quick Start
Installation
Configuration
Operations
Troubleshooting
Release Notes
Compatibility
Channel: stable
Source repo: JaddaHelpifyr/jhf-bobbin

Troubleshooting

Run This Check

Fast repo-local checks:

python3 scripts/check_docs_contract.py
python3 scripts/check_module_features_contract.py
python3 scripts/check_runtime_materialization_drift.py
python3 scripts/check_live_runtime_contract.py

Host/runtime checks:

bash scripts/plugin_smoke_test.sh
bash scripts/probe_live_host_readonly.sh

LocalAI starts but downloads too much

The tested aio-cpu image is convenient but heavy. For leaner production packaging, swap the image later, but keep the OpenAI-compatible embedding endpoint contract stable.

Mem0 recall/capture returns `Bad Request`

Check in this order:

LocalAI /v1/embeddings returns 384 values
Qdrant collection uses 384 + Cosine
The LocalAI compatibility patch is present in vendor/mem0-oss.mjs
Qdrant collection is plain, not native-tenant
Qdrant client in the fork is new enough for your server version

OpenAI SDK embeddings from LocalAI look wrong

In the validated stack, raw HTTP to LocalAI returned correct 384-d vectors while the SDK path returned an invalid 96-value zero vector. That is why this kit patches the fork to use raw HTTP for LocalAI embeddings.

AIO LocalAI returns `500` for `/v1/embeddings`

Symptom:

POST /v1/embeddings fails with 500
LocalAI logs mention a broken model path under /models/huggingface:/...
text-embedding-ada-002 appears present, but embeddings still fail

Cause:

the stock aio-cpu image can fall back to an internal huggingface://... embedding reference
that fallback can drift away from the working all-MiniLM-L6-v2 definition used by this kit

Fix in this repository:

force MODELS to include /models/text-embedding-ada-002.yaml
bind-mount the known-good localai-model.text-embedding-ada-002.yaml to /models/text-embedding-ada-002.yaml
restart the LocalAI container

Verify:

curl -s http://HOST:8088/v1/models
curl -s http://HOST:8088/v1/embeddings \
  -H 'Content-Type: application/json' \
  -d '{"model":"text-embedding-ada-002","input":"semantic memory check"}'

Expected:

/v1/models lists text-embedding-ada-002
/v1/embeddings returns a vector with 384 values

Operator guidance:

for production stability, slim remains the safer LocalAI profile
aio is fine when you want convenience, but only with the explicit model override now built into this kit

LocalAI says backend not found: `transformers` or `sentencetransformers`

Symptom:

/v1/embeddings fails even though the model file exists
LocalAI logs mention backend not found: transformers or backend not found: sentencetransformers
this often appears on latest-aio-cpu when operators expect all-MiniLM-L6-v2 to work through the sentence-transformer backend

Cause:

the runtime image does not provide the sentence-transformer backend for that deployment
the previous fallback path expected a GGUF artifact that was not guaranteed to exist

Deterministic fix path:

keep the canonical YAML contract:
- localai-model.text-embedding-ada-002.yaml (C:/CodexTest/jhf-bobbin/configs/localai-model.text-embedding-ada-002.yaml)
ensure the canonical GGUF artifact exists:
- all-minilm-l6-v2_f16.gguf
ensure installer vars are set:
- LOCALAI_EMBEDDING_MODEL_FILE=all-minilm-l6-v2_f16.gguf
- LOCALAI_EMBEDDING_MODEL_URL=https://huggingface.co/LLukas22/all-MiniLM-L6-v2-GGUF/resolve/main/all-minilm-l6-v2_f16.gguf
recreate LocalAI and verify POST /v1/embeddings returns length 384

Why this matters:

this path removes backend ambiguity and artifact-missing drift from the critical embedding runtime
the smoke path now fails closed when embeddings are not functional

AIO LocalAI keeps downloading unrelated models and Mem0 stays down too long

Symptom:

jhf-bobbin-localai remains in health: starting
logs show downloads for speech, image, vision, or other side models
OpenClaw logs show transient fetch failed during that time

Fix:

reduce MODELS to /models/text-embedding-ada-002.yaml during recovery
let the stack expose only the embedding path needed by Mem0 first
reintroduce extra AIO models later only if you actually need them

Validated effect:

this removed the long startup drag on the live host and allowed Mem0 recall/capture to come back promptly after recreate

OpenClaw update breaks semantic memory

Run:

bash scripts/reapply_after_openclaw_update.sh

If still broken, roll back to memory-core:

python3 scripts/activate_memory_core.py
cd /root/openclaw
docker compose up -d --force-recreate openclaw-gateway

`apply_patch` warning still appears

That warning is unrelated to the semantic-memory stack. It comes from the OpenClaw runtime/profile tool allowlist and does not by itself mean Mem0 is broken.

Qdrant external host vs local host

If you already have a Qdrant server elsewhere:

set USE_LOCAL_QDRANT=0
set QDRANT_URL=http://HOST:6333
rerun the installer

Mistral 7B embeddings feel worse than MiniLM

That is consistent with the live validation behind this kit.

mistral:7b-instruct-v0.3-q4_K_M can generate embeddings technically, but the retrieval quality for Mem0-style operational recall was usually worse than all-MiniLM-L6-v2.

Recommendation:

keep all-MiniLM-L6-v2 / text-embedding-ada-002 as the primary Mem0 embedding path
use local Mistral 7B for compaction, heartbeat, watchdog, and log-triage tasks instead

Want to disable Mem0 but keep the stack ready

Run:

python3 scripts/activate_memory_core.py
cd /root/openclaw
docker compose up -d --force-recreate openclaw-gateway

LocalAI and Qdrant can remain running for later re-enable.

Agents hit `session file locked` and suddenly switch provider/model

Symptom:

logs show session file locked (timeout 10000ms)
the same agent then shows model fallback decisions such as deepseek/deepseek-chat -> kimi -> ...
cicd-ops, main, or hocksie are affected most often

Important diagnosis:

this is usually not a Mem0 recall bug
the provider switch is often a downstream effect of the lock/timeout
in the validated live stack, the biggest trigger was the older agent-native post-compaction-recovery cron jobs before they were converted to the lightweight file-only variant

Why this happens:

those recovery jobs run with isolated cron session keys
but they still inspect the agent's recent state
that can contend with the hot interactive agent:*:main session for the same agent
once the session path stalls, the model request times out and OpenClaw falls through its fallback chain
a separate validated failure mode was using host-only paths like /root/openclaw/... inside the gateway container; the lightweight jobs must read the mounted container paths under /home/node/.openclaw/...

Preferred fix in this kit:

convert the older recovery jobs to the lightweight local-mistral / file-only form
disable the overlapping watchdog jobs for the same three workspaces so only one lightweight maintenance path remains

Helper:

bash scripts/migrate_recovery_crons_to_lightweight.sh

The helper converts these recovery jobs:

d8201a2f-f816-42bc-acb0-287df0a0050e
e9dc7e0c-bbbf-4108-af72-245d3a52c23c
1ca99d11-caa8-4e84-bfaf-8cb3707ad505

Verify:

openclaw cron list --json
timeout 20s docker logs --since 30m --tail 400 openclaw-gateway 2>&1 | grep -E 'session file locked|model fallback decision'

Expected:

the three recovery jobs above show agentId: local-mistral
their session keys start with agent:local-mistral:post-compaction-recovery-
the older overlapping watchdog jobs are disabled
lock spikes and surprise provider switches drop sharply

Log-read safety note:

on live hosts, use bounded log probes only (timeout + --since + --tail)
avoid unbounded docker logs calls in operational verification paths

Repeated LocalAI timeout bursts and recreate churn

Symptom:

frequent timeout/grpc-style errors around LocalAI probes
repeated recreate attempts increase host CPU pressure

Guarded recovery in this repo:

use scripts/localai_probe_guard.py for /readyz and /v1/models probes
allow the guard to enter temporary degraded mode when timeout bursts cross threshold
do not run tight recreate loops while degraded mode is active

Useful checks:

python3 scripts/localai_probe_guard.py --endpoint models --base-url http://<internal-runtime-redacted>:8088/v1
cat /tmp/jhf-bobbin-localai-guard.prom

Tuning knobs (installer env):

LOCALAI_PROBE_DEGRADE_THRESHOLD
LOCALAI_PROBE_TIMEOUT_WINDOW_SECONDS
LOCALAI_PROBE_DEGRADE_COOLDOWN_SECONDS
LOCALAI_MODELS_MIN_INTERVAL_SECONDS

AGPLv3. See ../LICENSE (LICENSE).

Learn more at helpifyr.com.

Documentation Map​

Troubleshooting

Run This Check​

LocalAI starts but downloads too much​

Mem0 recall/capture returns Bad Request​

OpenAI SDK embeddings from LocalAI look wrong​

AIO LocalAI returns 500 for /v1/embeddings​

LocalAI says backend not found: transformers or sentencetransformers​

AIO LocalAI keeps downloading unrelated models and Mem0 stays down too long​

OpenClaw update breaks semantic memory​

apply_patch warning still appears​

Qdrant external host vs local host​

Mistral 7B embeddings feel worse than MiniLM​

Want to disable Mem0 but keep the stack ready​

Agents hit session file locked and suddenly switch provider/model​

Repeated LocalAI timeout bursts and recreate churn​

Documentation Map

Run This Check

LocalAI starts but downloads too much

Mem0 recall/capture returns `Bad Request`

OpenAI SDK embeddings from LocalAI look wrong

AIO LocalAI returns `500` for `/v1/embeddings`

LocalAI says backend not found: `transformers` or `sentencetransformers`

AIO LocalAI keeps downloading unrelated models and Mem0 stays down too long

OpenClaw update breaks semantic memory

`apply_patch` warning still appears

Qdrant external host vs local host

Mistral 7B embeddings feel worse than MiniLM

Want to disable Mem0 but keep the stack ready

Agents hit `session file locked` and suddenly switch provider/model

Repeated LocalAI timeout bursts and recreate churn