Memory System Backend Configuration¶

Last Updated: April 14, 2025
Status: Complete

This document describes the configuration system for memory storage backends in the Neuroca memory system. It covers the configuration file structure, available options for each backend type, and how to use the configuration API.

Overview¶

The memory system uses a centralized YAML-based configuration system to manage backend settings. This approach:

Separates configuration from code
Allows for environment-specific configurations
Enables easy adjustment of performance parameters
Supports multiple backend types with different configuration needs

Configuration files are stored in the config/backends/ directory at the project root.

Configuration File Structure¶

The configuration system uses two types of files:

Base Configuration: base_config.yaml - Contains common settings shared by all backends
Backend-Specific Configuration: {backend_type}_config.yaml - Contains settings specific to a particular backend type

When a backend is created, the relevant configuration files are loaded and merged, with backend-specific settings taking precedence over base settings.

Base Configuration¶

The base configuration file (base_config.yaml) defines common settings across all backends:

# Common settings for all backends
common:
  # Cache settings
  cache:
    enabled: true
    max_size: 1000
    ttl_seconds: 300  # 5 minutes

  # Batch operation settings
  batch:
    max_batch_size: 100
    auto_commit: true

  # Performance settings
  performance:
    connection_pool_size: 5
    connection_timeout_seconds: 10
    operation_timeout_seconds: 30

  # Logging settings
  logging:
    enabled: true
    level: "INFO"  # DEBUG, INFO, WARNING, ERROR, CRITICAL
    log_queries: false

  # Health check settings
  health_check:
    enabled: true
    interval_seconds: 60
    timeout_seconds: 5
    max_retries: 3

  # Metrics settings
  metrics:
    enabled: true
    collect_detailed_stats: false

# Default backend to use if not specified
default_backend: "in_memory"

Backend-Specific Configurations¶

In-Memory Backend¶

Configuration file: in_memory_config.yaml

in_memory:
  # Memory allocation settings
  memory:
    initial_capacity: 1000
    auto_expand: true
    expansion_factor: 2
    max_capacity: 100000

  # Data structure settings
  data_structure:
    index_type: "hashmap"  # Options: hashmap, btree
    enable_secondary_indices: true

  # Persistence settings
  persistence:
    enabled: false
    file_path: "data/in_memory_backup.json"
    auto_save_interval_seconds: 300  # 5 minutes
    save_on_shutdown: true

  # Pruning settings
  pruning:
    enabled: true
    max_items: 10000
    strategy: "lru"  # Options: lru, lfu, fifo, lifo, random
    trigger_threshold: 0.9  # Pruning starts when capacity reaches 90%

  # Performance settings
  performance:
    use_concurrent_map: true
    lock_timeout_ms: 1000

SQLite Backend¶

Configuration file: sqlite_config.yaml

sqlite:
  # Connection settings
  connection:
    database_path: "data/memory_store.db"
    create_if_missing: true
    timeout_seconds: 5
    foreign_keys: true

  # Performance settings
  performance:
    page_size: 4096
    cache_size: 2000  # Pages in memory
    journal_mode: "WAL"  # Options: DELETE, TRUNCATE, PERSIST, MEMORY, WAL, OFF
    synchronous: "NORMAL"  # Options: OFF, NORMAL, FULL, EXTRA
    temp_store: "MEMORY"  # Options: DEFAULT, FILE, MEMORY
    mmap_size: 0  # 0 to disable

  # Schema settings
  schema:
    auto_migrate: true
    migration_table: "_schema_migrations"
    enable_triggers: true
    enable_fts: true  # Full-text search

  # Query settings
  query:
    max_query_length: 10000
    max_parameters: 999
    enforce_foreign_keys: true
    explain_query_threshold_ms: 100

  # Transaction settings
  transaction:
    auto_vacuum: "INCREMENTAL"  # Options: NONE, FULL, INCREMENTAL
    auto_commit: true
    isolation_level: "IMMEDIATE"  # Options: DEFERRED, IMMEDIATE, EXCLUSIVE

  # Backup settings
  backup:
    enabled: true
    interval_hours: 24
    keep_backups: 7
    backup_path: "data/backups/"

Redis Backend¶

Configuration file: redis_config.yaml

redis:
  # Connection settings
  connection:
    host: "localhost"
    port: 6379
    database: 0
    username: ""
    password: ""
    use_ssl: false
    timeout_seconds: 5

  # Key settings
  keys:
    prefix: "neuroca:memory:"
    separator: ":"
    encoding: "utf-8"
    expire_ttl_seconds: 0  # 0 means no expiration

  # Performance settings
  performance:
    use_connection_pool: true
    max_connections: 10
    socket_keepalive: true
    socket_timeout_seconds: 5
    retry_on_timeout: true
    retry_on_error: true
    max_retries: 3

  # Data structure settings
  data_structure:
    use_hash_for_metadata: true
    use_sorted_sets_for_indexing: true
    use_lists_for_ordered_data: true
    use_sets_for_tags: true

  # Serialization settings
  serialization:
    format: "json"  # Options: json, msgpack, pickle
    compress: false
    compression_threshold_bytes: 1024
    compression_level: 6

  # Pub/Sub settings
  pubsub:
    enabled: false
    channel_prefix: "neuroca:events:"

  # Lua scripts
  lua_scripts:
    enabled: true
    cache_scripts: true

  # Sentinel settings (if using Redis Sentinel)
  sentinel:
    enabled: false
    master_name: "mymaster"
    sentinels:
      - host: "sentinel-1"
        port: 26379
      - host: "sentinel-2"
        port: 26379

SQL Backend¶

Configuration file: sql_config.yaml

sql:
  # Connection settings
  connection:
    driver: "postgresql"  # Options: postgresql, mysql, mssql, oracle
    host: "localhost"
    port: 5432
    database: "neuroca_memory"
    username: "neuroca_user"
    password: ""
    schema: "public"
    ssl_mode: "disable"  # Options: disable, allow, prefer, require, verify-ca, verify-full

  # Connection pool settings
  pool:
    min_connections: 2
    max_connections: 10
    max_idle_time_seconds: 300
    max_lifetime_seconds: 3600
    connection_timeout_seconds: 5

  # Schema settings
  schema:
    table_prefix: "mem_"
    metadata_table: "memory_metadata"
    content_table: "memory_content"
    tags_table: "memory_tags"
    relations_table: "memory_relations"
    use_jsonb_for_metadata: true
    auto_create_tables: true
    auto_migrate: true
    migrations_table: "_migrations"

  # Query settings
  query:
    max_query_length: 10000
    max_parameters: 1000
    query_timeout_seconds: 30
    use_prepared_statements: true
    enable_query_logging: false
    explain_query_threshold_ms: 100

  # Transaction settings
  transaction:
    isolation_level: "READ COMMITTED"  # Options: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE
    auto_commit: false

  # Performance settings
  performance:
    use_batch_inserts: true
    max_batch_size: 1000
    use_upsert: true
    enable_statement_cache: true
    statement_cache_size: 100

  # PostgreSQL specific settings
  postgresql:
    enable_ssl: false
    application_name: "neuroca_memory"
    statement_timeout_ms: 30000
    use_advisory_locks: true
    enable_unaccent: true
    enable_pg_trgm: true

  # MySQL specific settings
  mysql:
    charset: "utf8mb4"
    collation: "utf8mb4_unicode_ci"
    enable_local_infile: false
    sql_mode: "STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION"

Vector Backend¶

Configuration file: vector_config.yaml

vector:
  # Storage settings
  storage:
    type: "memory"  # Options: memory, file, hybrid
    file_path: "data/vector_store.bin"
    auto_save: true
    save_interval_seconds: 300  # 5 minutes

  # Vector settings
  vector:
    dimension: 1536  # Default embedding dimension
    distance_metric: "cosine"  # Options: cosine, l2, dot, jaccard, hamming
    normalize_vectors: true

  # Index settings
  index:
    type: "hnsw"  # Options: hnsw, flat, ivf_flat, pq, ivf_pq, ivf_sq
    creation_threshold: 1000  # Create index after this many vectors
    build_on_creation: true
    use_gpu: false

  # HNSW index settings
  hnsw_index:
    ef_construction: 200
    ef_search: 50
    m: 16  # Number of connections per layer
    max_elements: 1000000

  # IVF index settings
  ivf_index:
    nlist: 100  # Number of clusters
    nprobe: 10  # Number of clusters to search

  # PQ index settings
  pq_index:
    code_size: 8  # Number of bytes per vector
    nbits: 8  # Number of bits per component

  # Search settings
  search:
    default_top_k: 10
    max_top_k: 1000
    pre_filter_enabled: true
    post_filter_enabled: true
    min_score_threshold: 0.5
    max_search_time_ms: 50

  # Clustering settings
  clustering:
    enabled: false
    algorithm: "kmeans"  # Options: kmeans, dbscan, hdbscan
    min_cluster_size: 5
    max_clusters: 100

  # Metadata filtering
  metadata:
    enable_filtering: true
    metadata_fields:
      - "source"
      - "timestamp"
      - "importance"
      - "tags"

  # Performance settings
  performance:
    use_multithreading: true
    num_threads: 4
    batch_size: 100
    cache_size_mb: 128

Configuration Loading API¶

The memory system provides a configuration loading API to access configuration values from code. This API is defined in the neuroca.memory.config.loader module.

Loading Configuration Files¶

To load configuration for a specific backend:

from neuroca.memory.config.loader import get_backend_config

# Load configuration for the in-memory backend
config = get_backend_config("in_memory")

# Access configuration values
cache_enabled = config["common"]["cache"]["enabled"]
initial_capacity = config["in_memory"]["memory"]["initial_capacity"]

Accessing Configuration Values¶

To access individual configuration values:

from neuroca.memory.config.loader import get_config_value

# Get a specific configuration value for a backend
cache_enabled = get_config_value("common.cache.enabled", "in_memory")

# Get a value with a default if not found
ttl = get_config_value("common.cache.ttl_seconds", "in_memory", default=300)

Custom Configuration Loader¶

For more control over configuration loading, you can create a ConfigurationLoader instance:

from neuroca.memory.config.loader import ConfigurationLoader

# Create a loader with a custom configuration directory
loader = ConfigurationLoader("/path/to/config/dir")

# Load configuration for a specific backend
config = loader.load_config("in_memory")

# Access values using dot notation
cache_enabled = loader.get_value("common.cache.enabled")

Backend Configuration in Factory¶

Backend instances are created using the StorageBackendFactory, which automatically loads the appropriate configuration for each backend type:

from neuroca.memory.backends.factory.backend_type import BackendType
from neuroca.memory.backends.factory.storage_factory import StorageBackendFactory

# Create an in-memory backend with default configuration
backend = StorageBackendFactory.create_backend(BackendType.MEMORY)

# Create a SQLite backend with default configuration
sqlite_backend = StorageBackendFactory.create_backend(BackendType.SQLITE)

Environment-Specific Configuration¶

To use different configurations for different environments (development, testing, production), place environment-specific configuration files in separate directories and specify the directory when creating the ConfigurationLoader instance:

from neuroca.memory.config.loader import ConfigurationLoader

# Development environment
dev_loader = ConfigurationLoader("config/dev/backends")
dev_config = dev_loader.load_config("in_memory")

# Production environment
prod_loader = ConfigurationLoader("config/prod/backends")
prod_config = prod_loader.load_config("in_memory")

Configuration Best Practices¶

Keep Configuration Separate: Avoid hardcoding configuration values in code. Use the configuration system instead.
Use Reasonable Defaults: Set reasonable default values for all configuration options.
Document Configuration Options: Document all configuration options and their allowed values.
Use Environment Variables: For sensitive configuration values (e.g., database passwords), use environment variables.
Validate Configuration: Validate configuration values at startup to catch errors early.
Use Different Configurations for Different Environments: Use different configuration files for development, testing, and production environments.

Memory Tier Configuration¶

Each memory tier can use a different backend type with a specific configuration:

from neuroca.memory.backends.factory.backend_type import BackendType
from neuroca.memory.backends.factory.memory_tier import MemoryTier
from neuroca.memory.backends.factory.storage_factory import StorageBackendFactory

# Short-term memory using in-memory backend
stm_backend = StorageBackendFactory.create_storage(MemoryTier.STM, BackendType.MEMORY)

# Medium-term memory using SQLite backend
mtm_backend = StorageBackendFactory.create_storage(MemoryTier.MTM, BackendType.SQLITE)

# Long-term memory using vector backend
ltm_backend = StorageBackendFactory.create_storage(MemoryTier.LTM, BackendType.VECTOR)

Each tier can have tier-specific configuration options by adding a tier-specific section to the configuration file:

# Example: in_memory_config.yaml with tier-specific settings
in_memory:
  # General settings...

  # STM-specific settings
  stm:
    max_items: 200

  # MTM-specific settings
  mtm:
    max_items: 5000

These tier-specific settings can be accessed using the configuration API:

from neuroca.memory.config.loader import get_config_value

# Get STM-specific setting
stm_max_items = get_config_value("in_memory.stm.max_items", "in_memory")

Conclusion¶

The backend configuration system provides a flexible and centralized way to manage configuration options for memory backends. By separating configuration from code, it allows for easy adjustment of backend behavior without code changes.