ELKA mark ELKA-0 documentation

Reference Guide

Memory-Native Reasoning

ELKA-0 Open Cognitive Engine API

A model-engine layer for memory-native, multimodal, safety-scored reasoning.

This page documents ELKA-0 itself: how to run it, call it, integrate it, and evaluate it. ELKA-0 can power multiple runtimes, including Kael, but this documentation focuses on ELKA contracts and behavior.

Language
Python
Service Version
0.2.0
Runtime Endpoints
4 public routes
Status
Open specification

Get Started

Quickstart in 60 Seconds

Start the service locally, verify health, then call inference. These commands reflect the current repo runtime layout.

  • 1. Install: Create your environment and install `requirements.txt`.
  • 2. Launch: Start ELKA-0 with the documented uvicorn factory command.
  • 3. Verify: Call `GET /health` before sending inference traffic.
  • 4. Iterate: Use `/infer`, `/chat`, or `/chat/stream` based on your integration path.
Local Runtime
pip install -r requirements.txt
uvicorn infrastructure.serving.app:create_service_app --factory --host 0.0.0.0 --port 8080
curl http://127.0.0.1:8080/health

API Reference

Runtime Endpoints

ELKA-0 currently exposes four production routes. Security guards can enforce API keys, signatures, tenant boundaries, and rate limits.

GET /health
Returns uptime, profile, memory health, request count, and average latency.
POST /infer
Runs structured reasoning using `InferRequest` and returns typed `InferResponse`.
POST /chat
Turn-based chat interface over `ChatRequest` for session conversation flows.
POST /chat/stream
Streaming chat via `application/x-ndjson` chunks for real-time response delivery.
Response Headers
`X-Request-Id` and `X-ELKA-Latency-Ms` are set by middleware on each request.
Optional Security Headers
`X-API-Key`, `X-Signature`, and tenant header support when configured.
Health Check Example
curl -X GET "http://127.0.0.1:8080/health"

Contracts

Request and Response Schemas

ELKA-0 is typed end to end. The integration surface is stable when you treat these structures as the source of truth.

  • InferRequestWrapper for `ReasoningRequest`, with optional trace flag.
  • ReasoningRequestSession id, user text, events, permissions, profile, and safety controls.
  • InferResponseDecision status, confidence, structured plan, explanation trace, and latency.
  • ChatRequestSession turns, optional permissions set, profile name, and metadata.
  • Perception Events`text`, `vision`, `audio`, `sensor`, `device_state`, `room_state`.
  • ActionPlanObjective, ordered steps, risk level, permission requirement, fallback strategy.
Infer Request Example
{
  "request": {
    "session_id": "session-01",
    "user_text": "Review backyard motion event and recommend next safe action",
    "events": [
      {
        "kind": "text",
        "text": "Motion alert received from backyard camera"
      }
    ],
    "permissions": ["observe", "notify"],
    "profile_name": "core",
    "allow_actions": true
  },
  "trace": true
}
Infer Response Shape
{
  "ok": true,
  "response": {
    "request_id": "...",
    "status": "planned",
    "natural_language_response": "...",
    "confidence": 0.82,
    "plan": { "objective": "...", "steps": [] },
    "explanation": { "summary": "...", "evidence": [] },
    "escalation": "notify",
    "profile_name": "core"
  },
  "model_version": "v0",
  "latency_ms": 91.4
}

System Design

Layered Architecture

Dependency direction is explicit: `core -> model -> cognition -> world_model -> policies -> safety -> infrastructure -> evaluation/tests`.

  1. Input Event
  2. Context Builder
  3. Reasoning Engine
  4. Planning and Policy
  5. Safety Gates
  6. Structured Output

Core Contracts

Shared interfaces, schema wrappers, constants, and orchestration boundaries.

Cognition and Memory

Inference orchestration, retrieval-native memory usage, uncertainty handling, and explainability.

Policy and Safety

Permission checks, escalation scoring, refusal paths, and risk-aware action constraints.

Infrastructure and Serving

FastAPI runtime, security middleware, observability hooks, and deployment entrypoints.

Integration Contract
Input Context -> ELKA Reasoning -> Risk Score -> Structured Plan -> Runtime Execution

Governance

Safety, Permissions, and Uncertainty

ELKA-0 is designed to prefer truthful, low-risk actions over fluent but unsafe behavior.

  1. Check permission requirements before impactful action.
  2. Prefer truth over fluency and ask when context is missing.
  3. Use minimum safe action first, then escalate if needed.
  4. Return explicit refusal with safe fallback when constraints fail.
  5. Preserve privacy and user trust as default runtime behavior.
  6. Attach explanation traces for important decisions.
  7. Allow immediate human override with no delay path.
  8. Record audit-friendly lifecycle stages for post-review.
API Key Auth
Configured keys are validated with `X-API-Key` when enabled.
Request Signing
Optional `X-Signature` verification protects request integrity.
Rate and Tenant Guard
Rate limits and tenant boundary checks are enforced in middleware.

Runtime Modes

Deployment Profiles and Operations

ELKA-0 supports local-first operation and can scale into hybrid or high-compute modes without changing core contracts.

  • Local Mode: On-device inference and memory behavior for privacy and offline resilience.
  • Hybrid Mode: Local orchestration with optional cloud model augmentation.
  • Edge Fallback: Resource-aware behavior for constrained hardware and intermittent network.
  • High-Compute Mode: Deeper multimodal reasoning for complex, multi-step plans.
  • Core Profile: Balanced reasoning and planning for general workloads.
  • Vision Profile: Scene and event-grounded reasoning for camera-rich systems.
  • Audio Profile: Interruptible dialog and urgency signal interpretation.
  • Edge Profile: Lower-resource cognition with predictable fallback behavior.
Service Launch Command
uvicorn infrastructure.serving.app:create_service_app --factory --host 0.0.0.0 --port 8080

Quality Targets

Evaluation and Runtime Signals

Use both offline tests and live runtime telemetry to track real quality, not just response style.

  • Memory recall accuracy
  • Action safety success rate
  • Planning completion rate
  • False escalation rate
  • Hallucination frequency
  • Interruption recovery quality
  • Persona consistency score
  • Multimodal grounding precision
  • Latency distribution by profile
  • Uncertainty honesty rate
Local Test Command
pytest -q
Operational tip: capture `X-Request-Id` and `X-ELKA-Latency-Ms` for each request in your client logs.

Open Source

Feedback, Issues, and Contributions

Treat this documentation as part of the engine contract. Improvements to docs, tests, schema notes, and integration examples are first-class contributions.

  • Report: Open issues for bugs, behavioral regressions, and unclear contracts.
  • Propose: Use pull requests to improve docs, schema examples, and evaluation coverage.
  • Collaborate: Keep changes transparent with rationale, test evidence, and migration notes.

Release Path

Roadmap and Priority Order

v0 Priority Sequence

  1. Retrieval-native memory behavior
  2. Multimodal grounding and event fusion
  3. Structured action and tool call planning
  4. Interruptible and stream-safe dialogue
  5. Uncertainty scoring and safety gating
  6. Stable persona adaptation boundaries
  7. World model continuity across sessions
  8. Mission replanning and interruption recovery

Version Path

  1. v0 Core contract and baseline runtime.
  2. v0.1 Memory-native inference quality pass.
  3. v0.2 Multimodal event grounding expansion.
  4. v0.3 Safety policy and plan reliability hardening.
  5. v0.4 Runtime integration maturity and profiling.
  6. v1.0 Stable public release with documented guarantees.

Build With ELKA

Ship integrations that remember, reason, and act with explicit safety constraints.