Semantic Recall

Vector-indexed long-term memory for retrieving relevant past knowledge using embedding similarity search.

Overview

Semantic Recall is a memory layer designed for vector-indexed long-term storage. It stores items as embeddings and retrieves them by similarity to the current query, making it ideal for large knowledge bases, past interactions, or document retrieval.

Slot: 400 (Slot.SEMANTIC_RECALL)
Scope: global

Concept

Unlike other memory layers that operate on structured state, Semantic Recall integrates with a vector store to perform approximate nearest-neighbor search. The recall hook embeds the current query and retrieves the most relevant items within the token budget. The store hook embeds new content and indexes it for future retrieval.

Building a Semantic Recall Layer

Noetic does not ship a built-in vector store. Instead, you implement the MemoryHooks interface with your preferred vector database. Here is the general shape:

import type { MemoryLayer } from '@noetic/core';
import { Slot } from '@noetic/core';

interface VectorStore {
  upsert(id: string, embedding: number[], text: string, metadata?: Record<string, unknown>): Promise<void>;
  query(embedding: number[], topK: number): Promise<Array<{ id: string; text: string; score: number }>>;
}

interface SemanticRecallConfig {
  vectorStore: VectorStore;
  embed: (text: string) => Promise<number[]>;
  topK?: number;
}

function semanticRecall(config: SemanticRecallConfig): MemoryLayer {
  return {
    id: 'semantic-recall',
    name: 'Semantic Recall',
    slot: Slot.SEMANTIC_RECALL,
    scope: 'global',
    budget: { min: 500, max: 3000 },
    hooks: {
      async init() {
        return { state: {} };
      },

      async recall({ query, budget }) {
        const embedding = await config.embed(query);
        const results = await config.vectorStore.query(
          embedding,
          config.topK ?? 10,
        );

        const items = results.map((r) => ({
          id: r.id,
          type: 'message' as const,
          role: 'developer' as const,
          status: 'completed' as const,
          content: [{ type: 'input_text' as const, text: r.text }],
        }));

        return {
          items,
          tokenCount: items.reduce(
            (sum, item) => sum + item.content[0].text.length / 4,
            0,
          ),
        };
      },

      async store({ newItems }) {
        for (const item of newItems) {
          if (item.type !== 'message') continue;
          const text = item.content
            .filter((c) => c.type === 'output_text' || c.type === 'input_text')
            .map((c) => ('text' in c ? c.text : ''))
            .join('');
          if (!text) continue;

          const embedding = await config.embed(text);
          await config.vectorStore.upsert(item.id, embedding, text);
        }
        return { state: {} };
      },
    },
  };
}

Adapter Examples

You can plug in any vector database:

Pinecone: Use @pinecone-database/pinecone as the vector store backend
Qdrant: Use @qdrant/js-client-rest
ChromaDB: Use chromadb
pgvector: Use raw SQL with pg and the vector extension
In-memory: Use a simple array with cosine similarity for development

Configuration Tips

Set topK based on your token budget. Each retrieved document consumes tokens.
Use budget: { min, max } to let the allocator distribute spare capacity.
Consider a scope of 'resource' if you want per-project knowledge isolation instead of global.
The store timeout should account for embedding API latency.

Next Steps

Custom Layers -- full guide to building any memory layer
Memory Layer System -- how slots, scopes, and budgets work together