OpenRelix Data Contracts
Languages: English | 简体中文
This document captures the public, contributor-facing data contracts for OpenRelix. It is not a dump of one maintainer's local state. Examples must stay synthetic and sanitized.
The source of truth remains the code and tests. Treat this page as the shared map that lets contributors make compatible changes without inspecting private runtime files.
Storage Boundary
OpenRelix has three storage zones:
| Zone | Examples | Public repo? | Notes |
|---|---|---|---|
| Repo source | scripts/, install/, templates/, docs/, tests | Yes | Reusable logic, schemas, sanitized examples |
| External state root | raw/, registry/, consolidated/, reports/, runtime/, log/ | No | User runtime data, generated artifacts, caches, logs |
| Host home | CODEX_HOME, CLAUDE_HOME, host-native memory and sessions | No | Owned by the AI host; OpenRelix reads or updates only through explicit adapter rules |
All paths in reusable code should be resolved through scripts/asset_runtime.py. Public docs and fixtures may use generic placeholder paths such as /tmp/openrelix-demo, ~/Library/Application Support/openrelix, or CODEX_HOME.
State Root Layout
<state-root>/
raw/
daily/<date>.json
windows/<date>/<window_id>.json
consolidated/
daily/<date>/summary.json
daily/<date>/summary.md
daily/<date>/runs/
registry/
assets.jsonl
usage_events.jsonl
memory_entries.jsonl
memory_items.jsonl
curated_memory_pack.json
reports/
overview-data.json
overview.md
overview.csv
panel.html
runtime/
config.json
openrelix-index.sqlite3
host-context/
memory_summary.md
curated-personal-memory-summary.md
log/
reports/, runtime/, and log/ are generated or machine-local. Do not commit real files from those directories. Sanitized tests may include a small fixture under tests/fixtures/sample-state/ only when every value is artificial.
Raw Daily Contract
Path:
raw/daily/<date>.json
Purpose: one day's collected host activity, grouped by window.
Minimum shape:
{
"date": "2026-04-28",
"stage": "manual",
"generated_at": "2026-04-28T10:10:00+08:00",
"timezone": "Asia/Shanghai",
"collection_source": "history",
"activity_host": "codex",
"window_count": 1,
"windows": []
}
Common daily top-level fields:
| Field | Type | Notes |
|---|---|---|
date | string | ISO collection date |
stage | string | Collection stage such as manual or final; tests may use synthetic values only when documented |
generated_at | string | When the collector wrote the file |
timezone | string | Local timezone used for daily grouping |
collection_source | string | history, app-server, auto, claude-history, or compatible source label |
activity_host | string | codex, claude, or all |
codex_profile_count | integer | Number of Codex profiles inspected when available |
codex_profiles | array | Sanitized profile metadata; do not publish real home paths |
host_counts | object | Per-host window counts |
collection_errors | array | Sanitized warnings or failures |
window_count | integer | Included windows |
excluded_window_count | integer | Windows excluded by filtering |
review_like_window_count | integer | Review-like windows detected |
prompt_count | integer | Total included prompts |
conclusion_count | integer | Total included conclusions |
windows | array | Included raw window objects |
excluded_windows | array | Sanitized excluded-window metadata |
review_like_windows | array | Sanitized review-like window metadata |
Common window fields:
| Field | Type | Required for | Notes |
|---|---|---|---|
date | string | indexing, panel | ISO date for the collected day |
window_id | string | all downstream joins | Stable per host window or session for that day |
ai_host | string | host display and resume | codex or claude; absent older rows default to codex |
cwd | string | project grouping | Use generic or redacted paths in fixtures and docs |
originator | string | diagnostics | Example: codex_cli, codex_app_server, claude_code |
source | string | diagnostics | Example: history, app-server, claude-history |
started_at | string | sorting | ISO-like timestamp with timezone when available |
session_file | string | local debugging | State-local or host-local path; never publish real paths |
thread_id | string | app-server resume | Optional |
session_id | string | host resume | Optional |
resume_id | string | panel action | Optional, host-specific |
window_summary | string | display/fallback | Optional host-provided short summary |
thread_title | string | display/fallback | Optional app-server or host title |
prompt_count | integer | metrics | Count after tool/system filtering |
conclusion_count | integer | metrics | Count after review-like filtering |
raw_conclusion_count | integer | diagnostics | Count before conclusion filtering |
review_like_window | boolean | filtering | Whether the window is primarily a review workflow |
review_related_window | boolean | filtering | Whether the window relates to review but still has usable prompts |
filtered_review_conclusion_count | integer | diagnostics | Filtered conclusion count |
conclusion_policy | string | diagnostics | Example: included |
prompts | array | fallback summaries | User-facing questions or requests |
conclusions | array | fallback summaries | Final assistant answers or useful conclusions |
app_server | object | diagnostics | Optional, sanitized Codex app-server metadata |
claude_code | object | diagnostics | Optional, sanitized Claude Code metadata |
Prompt item:
{
"turn_id": "turn-demo-1",
"ts": "2026-04-28T10:00:00+08:00",
"local_time": "2026-04-28T10:00:00+08:00",
"text": "设计一个脱敏样例 state"
}
Conclusion item:
{
"turn_id": "turn-demo-1",
"completed_at": "2026-04-28T10:05:00+08:00",
"text": "样例 state 应覆盖 raw、consolidated、registry 和 index。"
}
Do not store raw tool outputs, raw logs, tokens, cookies, account data, or proprietary snippets in public examples.
Raw Window Contract
Path:
raw/windows/<date>/<window_id>.json
Purpose: one window record extracted from the daily file for direct lookup and panel source references.
It should contain the same window object used in raw/daily/<date>.json. Downstream code should tolerate older records that lack newer fields, but new collectors should include ai_host, prompt_count, conclusion_count, prompts, and conclusions.
Consolidated Summary Contract
Path:
consolidated/daily/<date>/summary.json
Purpose: model-backed or fallback daily summary.
The model output is constrained by templates/nightly-summary-schema.json. Contributor-facing fields are:
| Field | Type | Notes |
|---|---|---|
date | string | Same collection date |
day_summary | string | Human-readable day overview |
window_summaries | array | Per-window summary rows |
keywords | array | Day-level keywords |
next_actions | array | Non-binding follow-up suggestions |
durable_memories | array | Legacy model output bucket; normalized before host injection |
session_memories | array | Legacy model output bucket; normalized before host injection |
low_priority_memories | array | Local or low-signal output |
global_context_memories | array | Evidence-rich cross-window memory candidates |
Window summary fields:
| Field | Type | Notes |
|---|---|---|
window_id | string | Joins to raw window |
cwd | string | Project grouping |
window_title | string | Short title for panel and index |
question_summary | string | Condensed user need |
question_count | integer | Number of prompts represented |
conclusion_count | integer | Number of conclusions represented |
keywords | array | Search and grouping |
main_takeaway | string | Highest-signal result |
summary_pairs | array | Question/conclusion pairs for traceability |
The current implementation still accepts older stage, summary_stage, model_status, and summary_generation diagnostics. Use them for selection and fallback behavior, not as user-facing facts.
Common diagnostic fields:
| Field | Notes |
|---|---|
language | Output language for generated text |
stage / summary_stage | Manual, preliminary, final, or compatibility stage |
generated_at | Summary generation time |
model_status / last_run_model_status | Model run status |
summary_generation | Lightweight, fallback, or model-backed generation label |
personal_memory_algorithm_version | Memory algorithm version used for fingerprinting |
compact_payload_source | Input compaction source |
learning_input_fingerprint | Skip-if-unchanged fingerprint |
quality / selection_decision | Summary quality and selection metadata |
Window Task Summary Contract
Path:
consolidated/task_summaries/<from>_<to>.json
Purpose: model-backed project task clusters built from existing window_summaries. This layer does not read raw transcripts; it lets already-organized windows gain a business-level "parallel tasks" summary after install or upgrade.
The model output is constrained by templates/window-task-summary-schema.json. Contributor-facing fields are:
| Field | Type | Notes |
|---|---|---|
schema_version | integer | Contract version for the task summary artifact |
task_cluster_algorithm_version | integer | Algorithm version used for migration and cache invalidation |
date_range | object | Inclusive source summary date range |
source_summary_dates | array | Daily summaries that contributed windows |
source_window_ids | array | All compact input window ids; newer artifacts tombstone older overlapping clusters for these windows |
source_fingerprint | string | Fingerprint of the compact window-summary input |
project_task_clusters | array | Model task clusters |
model_status | string | completed or failed; failed artifacts can tombstone stale clusters |
error | string | Sanitized failure category when model_status=failed |
Task cluster fields:
| Field | Type | Notes |
|---|---|---|
cluster_id | string | Stable local identifier for grouping windows |
project_label | string | Project display label inferred from source windows |
cwd | string | Source working directory when stable |
task_title | string | User-visible business/feature task name |
task_summary | string | Short grouping rationale |
source_window_ids | array | Window ids from window_summaries; invalid ids are dropped |
status_tags | array | Process state such as compile, verification, or commit |
confidence | string | high, medium, or low; panel grouping uses high/medium and falls back for low |
Memory Registry Contract
Canonical path:
registry/memory_entries.jsonl
Compatibility path:
registry/memory_items.jsonl
memory_entries.jsonl is the canonical personal-memory registry. memory_items.jsonl is a legacy fallback for migration and old state roots.
Recommended row fields:
| Field | Type | Notes |
|---|---|---|
memory_id or id | string | Stable id when known; otherwise code can derive a fingerprint |
date | string | First or latest source date |
updated_at | string | Optional ISO timestamp |
language | string | zh or en when display language matters |
source | string | Example: canonical, nightly_codex, manual, memory_review |
scope | string | global, project, repo, local, or compatible legacy values |
injection_policy | string | global_context, project_context, on_demand, local_only, never |
project_key | string | Stable project key for project-scoped rows |
project_label | string | Display label |
bucket | string | Compatibility grouping such as durable, session, low_priority |
memory_type | string | preference, procedural, semantic, episodic, task, rule, workflow |
priority | string | high, medium, low |
title | string | Short display title |
value_note | string | The reusable rule or fact |
source_window_ids | array | Evidence window ids |
source_dates | array | Evidence dates |
occurrence_count | integer | Repeated evidence count |
evidence_contexts | array | Short sanitized reasons; not raw transcript |
keywords | array | Search terms |
Injection policy is the important boundary:
| Policy | Default behavior |
|---|---|
global_context | Eligible for bounded host context within global budget |
project_context | Eligible for bounded host context within project budget |
on_demand | Searchable in registry/index, not injected by default |
local_only | Local display/search only |
never | Do not inject and avoid surfacing unless explicitly inspecting local data |
Do not use legacy durable/session labels as the product model for new docs. They exist for compatibility; current behavior is governed by scope, injection_policy, and priority.
Asset Registry Contract
Path:
registry/assets.jsonl
Purpose: stable reusable assets that a human or agent can intentionally reuse.
Recommended row fields:
| Field | Type | Notes |
|---|---|---|
id | string | Stable asset id |
title | string | Display title |
type | string | playbook, skill, template, automation, or related durable asset type |
domain | string | Area such as openrelix, general, frontend |
scope | string | personal, repo, project, or global |
status | string | active, draft, archived |
created_at | string | ISO date |
updated_at | string | ISO date |
source_task | string | Sanitized task id or summary |
source_review_path | string | Review path when available |
reuse_count | integer | Count of recorded reuse events |
minutes_saved_total | integer | Estimated time saved |
value_note | string | Why this asset matters |
artifact_paths | array | Local or repo-relative artifact paths |
tags | array | Search/grouping tags |
Only long-term reusable assets belong here. One-off task status belongs in a review or local note, not in assets.jsonl.
Asset Candidate Contract
There is currently no stable canonical registry/asset_candidates.jsonl writer. Treat this as a candidate/review-stage contract, not as current persisted repo truth.
Suggested candidate fields when this becomes a durable workflow:
| Field | Type | Notes |
|---|---|---|
date | string | Candidate creation date |
task | string | Sanitized task summary |
domain | string | Domain or project area |
repo | string | Optional repo label, not a private path |
source_review_path | string | Review artifact when available |
source_window_ids | array | Evidence window ids |
decision | string | accept, defer, reject, or needs_user_confirmation |
candidate_types | array | memory, playbook, template, automation, skill |
confidence | string | Human-readable confidence or score |
reuse_trigger | string | When this asset should be reused |
evidence | array | Sanitized reasons, not raw transcript |
privacy_risk | string | Privacy classification |
scope_decision | string | Proposed asset scope |
user_confirmation | string | Whether the user approved it |
no_asset_reason | string | Why rejected or deferred |
proposed_artifact_paths | array | Repo-relative or generic local paths |
generated_asset_ids | array | Created asset ids |
generated_memory_ids | array | Created memory ids |
Do not let an asset candidate enter assets.jsonl or host context just because it was detected. Human confirmation or an explicit policy gate should decide.
Usage Event Contract
Path:
registry/usage_events.jsonl
Purpose: evidence that an asset helped in a later task.
Recommended row fields:
| Field | Type | Notes |
|---|---|---|
event_id | string | Stable event id |
asset_id | string | Joins to assets.jsonl |
date | string | ISO date |
task | string | Sanitized task summary |
minutes_saved | integer | Estimated impact |
outcome | string | Short result |
source_window_id | string | Optional evidence reference |
Usage events should not include raw user data or private logs. They prove reuse, not transcript content.
Curated Pack Contract
Paths:
registry/curated_memory_pack.json
runtime/host-context/curated-personal-memory-summary.md
Purpose: deterministic review artifact generated from registry/memory_entries.jsonl.
The curated pack is non-invasive: it does not replace the active host summary and does not change injection behavior by itself.
Main fields:
| Field | Type | Notes |
|---|---|---|
schema_version | integer | Current curated pack schema |
source | string | Input label, normally registry/memory_entries.jsonl |
model_calls | integer | Should be 0 for deterministic builder |
entry_count | integer | Parsed memory rows |
sections | object | Grouped profile, preferences, rules, playbooks, task groups, local volatile notes |
diagnostics | object | Duplicate clusters, redaction, timeline-like entries, malformed lines |
artifact | object | Output metadata |
Section keys are stable:
user_profile
stable_preferences
operating_rules
project_playbooks
task_groups
local_volatile_notes
Curated item fields commonly include section, canonical_key, title, value_note, scope, injection_policy, project_label, memory_type, priority, memory_key, user_feedback, evidence_count, source_entry_ids, source_memory_keys, source_window_ids, source_dates, and diagnostics.
Diagnostics commonly include duplicate_clusters, timeline_like_entries, local_privacy_like_entries, possible_cross_project_leakage, truncation_markers, and malformed_lines.
Active Host Context Contract
Path inside state root:
runtime/host-context/memory_summary.md
Host targets:
CODEX_HOME/memories/memory_summary.md
CLAUDE_HOME/CLAUDE.md
Purpose: bounded summary compiled from eligible memory entries and written into OpenRelix-managed blocks in each enabled host target.
This is a compiled artifact, not source data. The source of truth is registry/memory_entries.jsonl; the active host context should be reproducible from registry rows, config, and compiler policy.
Rules:
- Preserve host-owned content outside OpenRelix markers.
- Respect
memory_modeand host feature flags. - Include only eligible
global_contextandproject_contextentries within token budgets. - Keep
on_demand,local_only,never, and low-priority rows out of default host context. - Never write a project repo file just to inject personal memory.
Useful budget/config fields to surface in generated diagnostics include target_tokens, warn_tokens, max_tokens, global_memory_tokens, project_memory_tokens, and personal_memory_tokens.
Runtime Config Contract
Path:
runtime/config.json
Minimum contributor-facing fields:
| Field | Type | Notes |
|---|---|---|
schema_version | integer | Runtime config schema |
language | string | zh or en |
memory_mode | string | integrated, local-only, or off |
personal_memory_enabled | boolean | Whether local personal memory writes are enabled |
codex_context_enabled | boolean | Whether Codex host context sync is enabled |
activity_source | string | history, app-server, or auto |
activity_host | string | codex, claude, or all |
model_cli | string | codex or claude |
codex_model | string | Internal Codex model for OpenRelix jobs |
claude_model | string | Internal Claude model for OpenRelix jobs |
memory_summary_max_tokens | integer | Supported host-context budget max |
host_context_targets | array | Enabled host context targets |
Overview Data Contract
Path:
reports/overview-data.json
Purpose: stable renderer input for reports/panel.html and report files.
Current contract validation lives in scripts/openrelix_overview/contract.py. Required top-level keys include:
schema_version, language, generated_at, summary, metrics, mix,
assets, reviews, usage_events, summary_terms, summary_term_views,
pipeline_status, token_usage, window_overview,
memory_registry, memory_policy_views, nightly_memory_views,
codex_native_memory, codex_native_memory_counts,
claude_native_memory, claude_native_memory_counts
Renderer code should consume overview-data.json rather than rereading raw JSONL, host memory files, or raw windows.
Fixture Contract
Sanitized sample state lives under:
tests/fixtures/sample-state/
Rules for fixtures:
- Use artificial dates, ids, project names, and paths.
- Keep paths generic, for example
/tmp/openrelix-demo. - Include enough data for raw window lookup, daily summary, registry search, and curated pack validation.
- Do not include real local state, real host transcripts, account names, internal URLs, tokens, screenshots, or generated panel files.
After changing fixture shape, run:
python3 -m unittest tests/test_sample_state_fixture.py