OpenRelix
Generated from Markdown Source docs/data-contracts.md 中文 EN

OpenRelix Data Contracts

Languages: English | 简体中文

This document captures the public, contributor-facing data contracts for OpenRelix. It is not a dump of one maintainer's local state. Examples must stay synthetic and sanitized.

The source of truth remains the code and tests. Treat this page as the shared map that lets contributors make compatible changes without inspecting private runtime files.

Storage Boundary

OpenRelix has three storage zones:

ZoneExamplesPublic repo?Notes
Repo sourcescripts/, install/, templates/, docs/, testsYesReusable logic, schemas, sanitized examples
External state rootraw/, registry/, consolidated/, reports/, runtime/, log/NoUser runtime data, generated artifacts, caches, logs
Host homeCODEX_HOME, CLAUDE_HOME, host-native memory and sessionsNoOwned by the AI host; OpenRelix reads or updates only through explicit adapter rules

All paths in reusable code should be resolved through scripts/asset_runtime.py. Public docs and fixtures may use generic placeholder paths such as /tmp/openrelix-demo, ~/Library/Application Support/openrelix, or CODEX_HOME.

State Root Layout

<state-root>/
  raw/
    daily/<date>.json
    windows/<date>/<window_id>.json
  consolidated/
    daily/<date>/summary.json
    daily/<date>/summary.md
    daily/<date>/runs/
  registry/
    assets.jsonl
    usage_events.jsonl
    memory_entries.jsonl
    memory_items.jsonl
    curated_memory_pack.json
  reports/
    overview-data.json
    overview.md
    overview.csv
    panel.html
  runtime/
    config.json
    openrelix-index.sqlite3
    host-context/
      memory_summary.md
      curated-personal-memory-summary.md
  log/

reports/, runtime/, and log/ are generated or machine-local. Do not commit real files from those directories. Sanitized tests may include a small fixture under tests/fixtures/sample-state/ only when every value is artificial.

Raw Daily Contract

Path:

raw/daily/<date>.json

Purpose: one day's collected host activity, grouped by window.

Minimum shape:

{
  "date": "2026-04-28",
  "stage": "manual",
  "generated_at": "2026-04-28T10:10:00+08:00",
  "timezone": "Asia/Shanghai",
  "collection_source": "history",
  "activity_host": "codex",
  "window_count": 1,
  "windows": []
}

Common daily top-level fields:

FieldTypeNotes
datestringISO collection date
stagestringCollection stage such as manual or final; tests may use synthetic values only when documented
generated_atstringWhen the collector wrote the file
timezonestringLocal timezone used for daily grouping
collection_sourcestringhistory, app-server, auto, claude-history, or compatible source label
activity_hoststringcodex, claude, or all
codex_profile_countintegerNumber of Codex profiles inspected when available
codex_profilesarraySanitized profile metadata; do not publish real home paths
host_countsobjectPer-host window counts
collection_errorsarraySanitized warnings or failures
window_countintegerIncluded windows
excluded_window_countintegerWindows excluded by filtering
review_like_window_countintegerReview-like windows detected
prompt_countintegerTotal included prompts
conclusion_countintegerTotal included conclusions
windowsarrayIncluded raw window objects
excluded_windowsarraySanitized excluded-window metadata
review_like_windowsarraySanitized review-like window metadata

Common window fields:

FieldTypeRequired forNotes
datestringindexing, panelISO date for the collected day
window_idstringall downstream joinsStable per host window or session for that day
ai_hoststringhost display and resumecodex or claude; absent older rows default to codex
cwdstringproject groupingUse generic or redacted paths in fixtures and docs
originatorstringdiagnosticsExample: codex_cli, codex_app_server, claude_code
sourcestringdiagnosticsExample: history, app-server, claude-history
started_atstringsortingISO-like timestamp with timezone when available
session_filestringlocal debuggingState-local or host-local path; never publish real paths
thread_idstringapp-server resumeOptional
session_idstringhost resumeOptional
resume_idstringpanel actionOptional, host-specific
window_summarystringdisplay/fallbackOptional host-provided short summary
thread_titlestringdisplay/fallbackOptional app-server or host title
prompt_countintegermetricsCount after tool/system filtering
conclusion_countintegermetricsCount after review-like filtering
raw_conclusion_countintegerdiagnosticsCount before conclusion filtering
review_like_windowbooleanfilteringWhether the window is primarily a review workflow
review_related_windowbooleanfilteringWhether the window relates to review but still has usable prompts
filtered_review_conclusion_countintegerdiagnosticsFiltered conclusion count
conclusion_policystringdiagnosticsExample: included
promptsarrayfallback summariesUser-facing questions or requests
conclusionsarrayfallback summariesFinal assistant answers or useful conclusions
app_serverobjectdiagnosticsOptional, sanitized Codex app-server metadata
claude_codeobjectdiagnosticsOptional, sanitized Claude Code metadata

Prompt item:

{
  "turn_id": "turn-demo-1",
  "ts": "2026-04-28T10:00:00+08:00",
  "local_time": "2026-04-28T10:00:00+08:00",
  "text": "设计一个脱敏样例 state"
}

Conclusion item:

{
  "turn_id": "turn-demo-1",
  "completed_at": "2026-04-28T10:05:00+08:00",
  "text": "样例 state 应覆盖 raw、consolidated、registry 和 index。"
}

Do not store raw tool outputs, raw logs, tokens, cookies, account data, or proprietary snippets in public examples.

Raw Window Contract

Path:

raw/windows/<date>/<window_id>.json

Purpose: one window record extracted from the daily file for direct lookup and panel source references.

It should contain the same window object used in raw/daily/<date>.json. Downstream code should tolerate older records that lack newer fields, but new collectors should include ai_host, prompt_count, conclusion_count, prompts, and conclusions.

Consolidated Summary Contract

Path:

consolidated/daily/<date>/summary.json

Purpose: model-backed or fallback daily summary.

The model output is constrained by templates/nightly-summary-schema.json. Contributor-facing fields are:

FieldTypeNotes
datestringSame collection date
day_summarystringHuman-readable day overview
window_summariesarrayPer-window summary rows
keywordsarrayDay-level keywords
next_actionsarrayNon-binding follow-up suggestions
durable_memoriesarrayLegacy model output bucket; normalized before host injection
session_memoriesarrayLegacy model output bucket; normalized before host injection
low_priority_memoriesarrayLocal or low-signal output
global_context_memoriesarrayEvidence-rich cross-window memory candidates

Window summary fields:

FieldTypeNotes
window_idstringJoins to raw window
cwdstringProject grouping
window_titlestringShort title for panel and index
question_summarystringCondensed user need
question_countintegerNumber of prompts represented
conclusion_countintegerNumber of conclusions represented
keywordsarraySearch and grouping
main_takeawaystringHighest-signal result
summary_pairsarrayQuestion/conclusion pairs for traceability

The current implementation still accepts older stage, summary_stage, model_status, and summary_generation diagnostics. Use them for selection and fallback behavior, not as user-facing facts.

Common diagnostic fields:

FieldNotes
languageOutput language for generated text
stage / summary_stageManual, preliminary, final, or compatibility stage
generated_atSummary generation time
model_status / last_run_model_statusModel run status
summary_generationLightweight, fallback, or model-backed generation label
personal_memory_algorithm_versionMemory algorithm version used for fingerprinting
compact_payload_sourceInput compaction source
learning_input_fingerprintSkip-if-unchanged fingerprint
quality / selection_decisionSummary quality and selection metadata

Window Task Summary Contract

Path:

consolidated/task_summaries/<from>_<to>.json

Purpose: model-backed project task clusters built from existing window_summaries. This layer does not read raw transcripts; it lets already-organized windows gain a business-level "parallel tasks" summary after install or upgrade.

The model output is constrained by templates/window-task-summary-schema.json. Contributor-facing fields are:

FieldTypeNotes
schema_versionintegerContract version for the task summary artifact
task_cluster_algorithm_versionintegerAlgorithm version used for migration and cache invalidation
date_rangeobjectInclusive source summary date range
source_summary_datesarrayDaily summaries that contributed windows
source_window_idsarrayAll compact input window ids; newer artifacts tombstone older overlapping clusters for these windows
source_fingerprintstringFingerprint of the compact window-summary input
project_task_clustersarrayModel task clusters
model_statusstringcompleted or failed; failed artifacts can tombstone stale clusters
errorstringSanitized failure category when model_status=failed

Task cluster fields:

FieldTypeNotes
cluster_idstringStable local identifier for grouping windows
project_labelstringProject display label inferred from source windows
cwdstringSource working directory when stable
task_titlestringUser-visible business/feature task name
task_summarystringShort grouping rationale
source_window_idsarrayWindow ids from window_summaries; invalid ids are dropped
status_tagsarrayProcess state such as compile, verification, or commit
confidencestringhigh, medium, or low; panel grouping uses high/medium and falls back for low

Memory Registry Contract

Canonical path:

registry/memory_entries.jsonl

Compatibility path:

registry/memory_items.jsonl

memory_entries.jsonl is the canonical personal-memory registry. memory_items.jsonl is a legacy fallback for migration and old state roots.

Recommended row fields:

FieldTypeNotes
memory_id or idstringStable id when known; otherwise code can derive a fingerprint
datestringFirst or latest source date
updated_atstringOptional ISO timestamp
languagestringzh or en when display language matters
sourcestringExample: canonical, nightly_codex, manual, memory_review
scopestringglobal, project, repo, local, or compatible legacy values
injection_policystringglobal_context, project_context, on_demand, local_only, never
project_keystringStable project key for project-scoped rows
project_labelstringDisplay label
bucketstringCompatibility grouping such as durable, session, low_priority
memory_typestringpreference, procedural, semantic, episodic, task, rule, workflow
prioritystringhigh, medium, low
titlestringShort display title
value_notestringThe reusable rule or fact
source_window_idsarrayEvidence window ids
source_datesarrayEvidence dates
occurrence_countintegerRepeated evidence count
evidence_contextsarrayShort sanitized reasons; not raw transcript
keywordsarraySearch terms

Injection policy is the important boundary:

PolicyDefault behavior
global_contextEligible for bounded host context within global budget
project_contextEligible for bounded host context within project budget
on_demandSearchable in registry/index, not injected by default
local_onlyLocal display/search only
neverDo not inject and avoid surfacing unless explicitly inspecting local data

Do not use legacy durable/session labels as the product model for new docs. They exist for compatibility; current behavior is governed by scope, injection_policy, and priority.

Asset Registry Contract

Path:

registry/assets.jsonl

Purpose: stable reusable assets that a human or agent can intentionally reuse.

Recommended row fields:

FieldTypeNotes
idstringStable asset id
titlestringDisplay title
typestringplaybook, skill, template, automation, or related durable asset type
domainstringArea such as openrelix, general, frontend
scopestringpersonal, repo, project, or global
statusstringactive, draft, archived
created_atstringISO date
updated_atstringISO date
source_taskstringSanitized task id or summary
source_review_pathstringReview path when available
reuse_countintegerCount of recorded reuse events
minutes_saved_totalintegerEstimated time saved
value_notestringWhy this asset matters
artifact_pathsarrayLocal or repo-relative artifact paths
tagsarraySearch/grouping tags

Only long-term reusable assets belong here. One-off task status belongs in a review or local note, not in assets.jsonl.

Asset Candidate Contract

There is currently no stable canonical registry/asset_candidates.jsonl writer. Treat this as a candidate/review-stage contract, not as current persisted repo truth.

Suggested candidate fields when this becomes a durable workflow:

FieldTypeNotes
datestringCandidate creation date
taskstringSanitized task summary
domainstringDomain or project area
repostringOptional repo label, not a private path
source_review_pathstringReview artifact when available
source_window_idsarrayEvidence window ids
decisionstringaccept, defer, reject, or needs_user_confirmation
candidate_typesarraymemory, playbook, template, automation, skill
confidencestringHuman-readable confidence or score
reuse_triggerstringWhen this asset should be reused
evidencearraySanitized reasons, not raw transcript
privacy_riskstringPrivacy classification
scope_decisionstringProposed asset scope
user_confirmationstringWhether the user approved it
no_asset_reasonstringWhy rejected or deferred
proposed_artifact_pathsarrayRepo-relative or generic local paths
generated_asset_idsarrayCreated asset ids
generated_memory_idsarrayCreated memory ids

Do not let an asset candidate enter assets.jsonl or host context just because it was detected. Human confirmation or an explicit policy gate should decide.

Usage Event Contract

Path:

registry/usage_events.jsonl

Purpose: evidence that an asset helped in a later task.

Recommended row fields:

FieldTypeNotes
event_idstringStable event id
asset_idstringJoins to assets.jsonl
datestringISO date
taskstringSanitized task summary
minutes_savedintegerEstimated impact
outcomestringShort result
source_window_idstringOptional evidence reference

Usage events should not include raw user data or private logs. They prove reuse, not transcript content.

Curated Pack Contract

Paths:

registry/curated_memory_pack.json
runtime/host-context/curated-personal-memory-summary.md

Purpose: deterministic review artifact generated from registry/memory_entries.jsonl.

The curated pack is non-invasive: it does not replace the active host summary and does not change injection behavior by itself.

Main fields:

FieldTypeNotes
schema_versionintegerCurrent curated pack schema
sourcestringInput label, normally registry/memory_entries.jsonl
model_callsintegerShould be 0 for deterministic builder
entry_countintegerParsed memory rows
sectionsobjectGrouped profile, preferences, rules, playbooks, task groups, local volatile notes
diagnosticsobjectDuplicate clusters, redaction, timeline-like entries, malformed lines
artifactobjectOutput metadata

Section keys are stable:

user_profile
stable_preferences
operating_rules
project_playbooks
task_groups
local_volatile_notes

Curated item fields commonly include section, canonical_key, title, value_note, scope, injection_policy, project_label, memory_type, priority, memory_key, user_feedback, evidence_count, source_entry_ids, source_memory_keys, source_window_ids, source_dates, and diagnostics.

Diagnostics commonly include duplicate_clusters, timeline_like_entries, local_privacy_like_entries, possible_cross_project_leakage, truncation_markers, and malformed_lines.

Active Host Context Contract

Path inside state root:

runtime/host-context/memory_summary.md

Host targets:

CODEX_HOME/memories/memory_summary.md
CLAUDE_HOME/CLAUDE.md

Purpose: bounded summary compiled from eligible memory entries and written into OpenRelix-managed blocks in each enabled host target.

This is a compiled artifact, not source data. The source of truth is registry/memory_entries.jsonl; the active host context should be reproducible from registry rows, config, and compiler policy.

Rules:

  • Preserve host-owned content outside OpenRelix markers.
  • Respect memory_mode and host feature flags.
  • Include only eligible global_context and project_context entries within token budgets.
  • Keep on_demand, local_only, never, and low-priority rows out of default host context.
  • Never write a project repo file just to inject personal memory.

Useful budget/config fields to surface in generated diagnostics include target_tokens, warn_tokens, max_tokens, global_memory_tokens, project_memory_tokens, and personal_memory_tokens.

Runtime Config Contract

Path:

runtime/config.json

Minimum contributor-facing fields:

FieldTypeNotes
schema_versionintegerRuntime config schema
languagestringzh or en
memory_modestringintegrated, local-only, or off
personal_memory_enabledbooleanWhether local personal memory writes are enabled
codex_context_enabledbooleanWhether Codex host context sync is enabled
activity_sourcestringhistory, app-server, or auto
activity_hoststringcodex, claude, or all
model_clistringcodex or claude
codex_modelstringInternal Codex model for OpenRelix jobs
claude_modelstringInternal Claude model for OpenRelix jobs
memory_summary_max_tokensintegerSupported host-context budget max
host_context_targetsarrayEnabled host context targets

Overview Data Contract

Path:

reports/overview-data.json

Purpose: stable renderer input for reports/panel.html and report files.

Current contract validation lives in scripts/openrelix_overview/contract.py. Required top-level keys include:

schema_version, language, generated_at, summary, metrics, mix,
assets, reviews, usage_events, summary_terms, summary_term_views,
pipeline_status, token_usage, window_overview,
memory_registry, memory_policy_views, nightly_memory_views,
codex_native_memory, codex_native_memory_counts,
claude_native_memory, claude_native_memory_counts

Renderer code should consume overview-data.json rather than rereading raw JSONL, host memory files, or raw windows.

Fixture Contract

Sanitized sample state lives under:

tests/fixtures/sample-state/

Rules for fixtures:

  • Use artificial dates, ids, project names, and paths.
  • Keep paths generic, for example /tmp/openrelix-demo.
  • Include enough data for raw window lookup, daily summary, registry search, and curated pack validation.
  • Do not include real local state, real host transcripts, account names, internal URLs, tokens, screenshots, or generated panel files.

After changing fixture shape, run:

python3 -m unittest tests/test_sample_state_fixture.py