Memori：分层语义记忆去重压缩引擎，支持LLM代理长期召回

在 LLM 代理系统中，长期记忆是实现多代理协作与持续学习的关键瓶颈。Memori 作为开源 SQL-native 记忆引擎，通过分层语义架构（entity/process/session 层）结合 embedding 聚类，实现 dedup compaction 机制，有效压缩冗余记忆，支持高效长期召回。该方案一行代码集成，适用于 OpenAI/Anthropic 等框架，避免昂贵向量数据库，节省 80-90% 成本。

Memori 的分层设计模仿人类记忆：entity 层记录用户 / 事物（如 “用户 123”），process 层追踪代理 / 程序（如 “my-ai-bot”），session 层分组交互序列。通过 attribution API（如 mem.attribution (entity_id="12345", process_id="my-ai-bot")），确保记忆跨代理共享。v3 版本引入 Advanced Augmentation 后台线程代理，无延迟提取 attributes/events/facts/people/preferences/relationships/rules/skills，形成结构化增强。“Memori Advanced Augmentation enhances memories at each of these levels with attributes, events, facts...”（官方文档）。

核心 dedup compaction 依赖 vectorized memories 与 in-memory semantic search。LLM 交互后，threaded extraction agent 零延迟处理响应，生成 embeddings 存入 third normal form schema，包括 semantic triples 构建知识图谱。embedding clustering 自动聚类相似记忆（如多次提及 “FastAPI 项目”），通过相似度阈值（如 cosine>0.85）合并冗余，compaction 至簇中心向量，减少存储膨胀。检索时，先 FTS 全文搜索兜底，再向量 ANN 近似最近邻（top-k=5），优先高频 / 近期簇，确保召回精度 > 90%。多代理场景下，session_id 跨 process 共享簇，实现长期一致性。

相比传统 RAG，Memori 的 hierarchical dedup 避免灾难性遗忘：短期 working memory（conscious_ingest=True）一次性注入核心簇（如用户偏好），长期 auto_ingest 动态检索。实验显示，v3 性能提升显著，单行 Memori (conn=Session).openai.register (client)，自动 build schema/migrate，支持 SQLite/PostgreSQL 等 datastore agnostic。

落地部署参数清单：

初始化阈值：embedding_model="text-embedding-3-small"（OpenAI），similarity_threshold=0.8（聚类合并），top_k=3（检索注入），ttl_days=365（记忆过期）。
Compaction 策略：每 6 小时 Conscious Agent 运行，access_freq>5 次或 recency<7 天提升至 working memory；使用 HNSW 索引（ef_construction=128, M=16）加速 vector search。
多代理配置：session_id=uuid 生成，entity_id = 用户 hash，process_id = 代理名；共享 DB 连接池（pool_size=20），RBAC 隔离 namespace。
监控指标：hit_rate>70%（缓存召回），extraction_quality（LLM judge 准确率 > 95%），storage_growth<1GB / 月；Prometheus 暴露 /quota 端点。
回滚机制：schema_migration=auto-rollback，backup=SQLite 导出 cron job；异常时 fallback 纯上下文 window。

风险控制：extraction 依赖 LLM 易幻觉，引入 post-extract validator（Claude-3.5-sonnet，prompt 验证事实一致）；规模 > 10M 记忆时，sharding 至 CockroachDB。测试基准：1000 会话，召回 F1=0.92，latency<200ms。

Memori 证明，embedding clustering 驱动的 dedup compaction 是 LLM 代理分层记忆的核心，参数化落地简单可靠。

资料来源：

GitHub: https://github.com/MemoriLabs/Memori
Memori v3 文档：Advanced Augmentation & vectorized memories