ELKA-0 documentation

Spec first. Runtime ready.

Reference Guide

Memory-Native Reasoning

ELKA-0 Open Cognitive Engine API

A model-engine layer for memory-native, multimodal, safety-scored reasoning.

This page documents ELKA-0 itself: how to run it, call it, integrate it, and evaluate it. ELKA-0 can power multiple runtimes, including Kael, but this documentation focuses on ELKA contracts and behavior.

View Repository Read README Contributing Guide

Language: Python
Service Version: 0.2.0
Runtime Endpoints: 4 public routes
Status: Open specification

Get Started

Quickstart in 60 Seconds

Start the service locally, verify health, then call inference. These commands reflect the current repo runtime layout.

1. Install: Create your environment and install `requirements.txt`.
2. Launch: Start ELKA-0 with the documented uvicorn factory command.
3. Verify: Call `GET /health` before sending inference traffic.
4. Iterate: Use `/infer`, `/chat`, or `/chat/stream` based on your integration path.

pip install -r requirements.txt
uvicorn infrastructure.serving.app:create_service_app --factory --host 0.0.0.0 --port 8080
curl http://127.0.0.1:8080/health

API Reference

Runtime Endpoints

ELKA-0 currently exposes four production routes. Security guards can enforce API keys, signatures, tenant boundaries, and rate limits.

GET /health: Returns uptime, profile, memory health, request count, and average latency.
POST /infer: Runs structured reasoning using `InferRequest` and returns typed `InferResponse`.
POST /chat: Turn-based chat interface over `ChatRequest` for session conversation flows.
POST /chat/stream: Streaming chat via `application/x-ndjson` chunks for real-time response delivery.
Response Headers: `X-Request-Id` and `X-ELKA-Latency-Ms` are set by middleware on each request.
Optional Security Headers: `X-API-Key`, `X-Signature`, and tenant header support when configured.

curl -X GET "http://127.0.0.1:8080/health"

Contracts

Request and Response Schemas

ELKA-0 is typed end to end. The integration surface is stable when you treat these structures as the source of truth.

InferRequestWrapper for `ReasoningRequest`, with optional trace flag.
ReasoningRequestSession id, user text, events, permissions, profile, and safety controls.
InferResponseDecision status, confidence, structured plan, explanation trace, and latency.
ChatRequestSession turns, optional permissions set, profile name, and metadata.
Perception Events`text`, `vision`, `audio`, `sensor`, `device_state`, `room_state`.
ActionPlanObjective, ordered steps, risk level, permission requirement, fallback strategy.

{
  "request": {
    "session_id": "session-01",
    "user_text": "Review backyard motion event and recommend next safe action",
    "events": [
      {
        "kind": "text",
        "text": "Motion alert received from backyard camera"
      }
    ],
    "permissions": ["observe", "notify"],
    "profile_name": "core",
    "allow_actions": true
  },
  "trace": true
}

{
  "ok": true,
  "response": {
    "request_id": "...",
    "status": "planned",
    "natural_language_response": "...",
    "confidence": 0.82,
    "plan": { "objective": "...", "steps": [] },
    "explanation": { "summary": "...", "evidence": [] },
    "escalation": "notify",
    "profile_name": "core"
  },
  "model_version": "v0",
  "latency_ms": 91.4
}

System Design

Layered Architecture

Dependency direction is explicit: `core -> model -> cognition -> world_model -> policies -> safety -> infrastructure -> evaluation/tests`.

Input Event
->
Context Builder
->
Reasoning Engine
->
Planning and Policy
->
Safety Gates
->
Structured Output

Core Contracts

Shared interfaces, schema wrappers, constants, and orchestration boundaries.

Cognition and Memory

Inference orchestration, retrieval-native memory usage, uncertainty handling, and explainability.

Policy and Safety

Permission checks, escalation scoring, refusal paths, and risk-aware action constraints.

Infrastructure and Serving

FastAPI runtime, security middleware, observability hooks, and deployment entrypoints.

Input Context -> ELKA Reasoning -> Risk Score -> Structured Plan -> Runtime Execution

Governance

Safety, Permissions, and Uncertainty

ELKA-0 is designed to prefer truthful, low-risk actions over fluent but unsafe behavior.

Check permission requirements before impactful action.
Prefer truth over fluency and ask when context is missing.
Use minimum safe action first, then escalate if needed.
Return explicit refusal with safe fallback when constraints fail.
Preserve privacy and user trust as default runtime behavior.
Attach explanation traces for important decisions.
Allow immediate human override with no delay path.
Record audit-friendly lifecycle stages for post-review.

API Key Auth
Configured keys are validated with `X-API-Key` when enabled.

Request Signing
Optional `X-Signature` verification protects request integrity.

Rate and Tenant Guard
Rate limits and tenant boundary checks are enforced in middleware.

Runtime Modes

Deployment Profiles and Operations

ELKA-0 supports local-first operation and can scale into hybrid or high-compute modes without changing core contracts.

Local Mode: On-device inference and memory behavior for privacy and offline resilience.
Hybrid Mode: Local orchestration with optional cloud model augmentation.
Edge Fallback: Resource-aware behavior for constrained hardware and intermittent network.
High-Compute Mode: Deeper multimodal reasoning for complex, multi-step plans.

Core Profile: Balanced reasoning and planning for general workloads.
Vision Profile: Scene and event-grounded reasoning for camera-rich systems.
Audio Profile: Interruptible dialog and urgency signal interpretation.
Edge Profile: Lower-resource cognition with predictable fallback behavior.

uvicorn infrastructure.serving.app:create_service_app --factory --host 0.0.0.0 --port 8080

Quality Targets

Evaluation and Runtime Signals

Use both offline tests and live runtime telemetry to track real quality, not just response style.

Memory recall accuracy
Action safety success rate
Planning completion rate
False escalation rate
Hallucination frequency
Interruption recovery quality
Persona consistency score
Multimodal grounding precision
Latency distribution by profile
Uncertainty honesty rate

pytest -q

Operational tip: capture `X-Request-Id` and `X-ELKA-Latency-Ms` for each request in your client logs.

Open Source

Feedback, Issues, and Contributions

Treat this documentation as part of the engine contract. Improvements to docs, tests, schema notes, and integration examples are first-class contributions.

Report: Open issues for bugs, behavioral regressions, and unclear contracts.
Propose: Use pull requests to improve docs, schema examples, and evaluation coverage.
Collaborate: Keep changes transparent with rationale, test evidence, and migration notes.

Open Issues Contribution Guide Fork Repository

Release Path

Roadmap and Priority Order

v0 Priority Sequence

Retrieval-native memory behavior
Multimodal grounding and event fusion
Structured action and tool call planning
Interruptible and stream-safe dialogue
Uncertainty scoring and safety gating
Stable persona adaptation boundaries
World model continuity across sessions
Mission replanning and interruption recovery

Version Path

v0 Core contract and baseline runtime.
v0.1 Memory-native inference quality pass.
v0.2 Multimodal event grounding expansion.
v0.3 Safety policy and plan reliability hardening.
v0.4 Runtime integration maturity and profiling.
v1.0 Stable public release with documented guarantees.

Build With ELKA

Ship integrations that remember, reason, and act with explicit safety constraints.

Start Building Download Source