Skip to content

[MMAP] Add MMapAllocator with file-backed memory mapping support#335

Open
rfsaliev wants to merge 3 commits into
mainfrom
rfsaliev/allocator-mmap
Open

[MMAP] Add MMapAllocator with file-backed memory mapping support#335
rfsaliev wants to merge 3 commits into
mainfrom
rfsaliev/allocator-mmap

Conversation

@rfsaliev
Copy link
Copy Markdown
Member

This pull request introduces a new file-backed, memory-mapped allocator (MMapAllocator) to the codebase, along with a supporting allocation registry and extensive tests. The changes enable efficient, zero-copy data and graph storage using memory-mapped files, and provide mechanisms for cache control and access pattern hints. Additional updates include minor copyright year adjustments and fixes to graph comparing routine.

The most important changes are:

New Memory-mapped Allocator and Registry:

  • Added MMapAllocator template class to svs/core/allocator.h, allowing allocation of memory backed by files, supporting efficient large dataset and graph storage. Includes support for access pattern hints and cache eviction.
  • Introduced MMapAllocationRegistry singleton for tracking and managing all active memory-mapped allocations, including thread-safe allocation, deallocation, and page eviction.
  • Added MMapAccessHint enum and helper to set madvise hints for memory-mapped regions, improving performance for different access patterns.

Testing and Validation:

  • Added comprehensive tests for MMapAllocator in tests/svs/core/allocator.cpp, covering allocation/deallocation, rebinding, equality, integration with SimpleData, BlockedData, and SimpleGraph, and file-backed correctness.

Graph and Data Integration:

  • Updated tests and includes to ensure MMapAllocator works as a drop-in replacement for existing data and graph containers, demonstrating compatibility and correctness. [1] [2]

Minor Fixes:

  • Fixed node counting in graphs_equal by replacing num_nodes() with n_nodes() to match the correct API.
  • Updated copyright years to 2026 in affected files. [1] [2] [3]

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new file-backed memory-mapped allocator to the SVS public C++ API to support large, file-backed data/graph storage and provides tests validating it works as a drop-in allocator for existing containers. Also includes a small fix to graph equality and copyright year updates.

Changes:

  • Introduces svs::MMapAllocator<T> and a process-wide detail::MMapAllocationRegistry for tracking/evicting active mmaps.
  • Adds Catch2 coverage for allocation behavior and integration with SimpleData, BlockedData, and SimpleGraph.
  • Fixes graphs_equal to use n_nodes() consistently with the graph concepts.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
include/svs/core/allocator.h Adds MMapAllocator, allocation registry, access-hint plumbing, and eviction support.
tests/svs/core/allocator.cpp Adds integration/behavior tests for MMapAllocator across data and graph types.
include/svs/concepts/graph.h Fixes graphs_equal to use n_nodes() instead of num_nodes().
Comments suppressed due to low confidence (5)

include/svs/core/allocator.h:918

  • MMapAllocator stores the registry as a reference member (allocation_resource_), which makes the allocator type not CopyAssignable/MoveAssignable (reference members delete assignment). Standard-library containers and the configured propagate_on_container_* = true traits can require allocator assignment, so this can break compilation/behavior. Consider storing a pointer or std::reference_wrapper instead (or drop the propagate traits) so the allocator satisfies the standard allocator requirements.
    std::filesystem::path base_path_;
    MMapAccessHint access_hint_ = MMapAccessHint::Normal;
    detail::MMapAllocationRegistry& allocation_resource_;

  public:
    // C++ allocator type aliases
    using value_type = T;
    using propagate_on_container_copy_assignment = std::true_type;
    using propagate_on_container_move_assignment = std::true_type;
    using propagate_on_container_swap = std::true_type;

include/svs/core/allocator.h:973

  • The file name generation is not guaranteed unique across allocator lifetimes or process runs (it uses allocation_count() which can decrease after deallocation, and files are not removed). Combined with MemoryMapper::MayCreate, a collision will silently reuse an existing file (and potentially stale contents) instead of creating a fresh backing file as documented. Use a uniqueness mechanism that checks for existence (e.g., monotonic counter that never decrements, UUID/mkstemp-style), and/or switch mapping to MustCreate with retry on collision.
    [[nodiscard]] T* allocate(size_t n) {
        size_t bytes = sizeof(T) * n;
        auto file_path = generate_file_path(bytes);
        void* ptr = allocation_resource_.allocate(bytes, file_path);
        detail::apply_mmap_access_hint(ptr, bytes, access_hint_);
        return static_cast<T*>(ptr);

include/svs/core/allocator.h:901

  • This public API doc references MMapFileViewAllocator, but no such type exists in the repository. Please either add the referenced allocator or update the documentation to point to the actual intended API (e.g., the existing MemoryMapper/registry helpers).
/// Each allocate() call creates a fresh temp file under @c base_path_ and
/// returns a writable mmap of that file. Intended for storing data that is
/// produced at runtime (e.g. an index's secondary, full-dimension dataset)
/// in file-backed pages instead of anonymous RAM.
///
/// For zero-copy loading from a pre-existing file, use
/// @ref MMapFileViewAllocator instead.
///

include/svs/core/allocator.h:728

  • MMapAllocationRegistry::allocate is documented as allocating into a "freshly created (or extended) file", but it uses MemoryMapper::MayCreate, which will refuse to resize an existing file (and will not extend it). Either change the policy/implementation to actually extend/truncate as intended, or update the documentation to reflect the real behavior.
    ///
    /// @brief Allocate memory mapped to a freshly created (or extended) file.
    ///
    /// @param bytes Number of bytes to allocate
    /// @param file_path Path to the file for backing storage
    /// @return Pointer to the allocated memory
    ///
    [[nodiscard]] void* allocate(size_t bytes, const std::filesystem::path& file_path) {
        MemoryMapper mapper{MemoryMapper::ReadWrite, MemoryMapper::MayCreate};
        auto mmap_ptr = mapper.mmap(file_path, lib::Bytes(bytes));

include/svs/core/allocator.h:934

  • The constructor exposes detail::MMapAllocationRegistry& as a public parameter, which leaks an internal detail type into the public API surface. If this is only for testing or internal wiring, consider removing it from the public constructor (always use the singleton internally), or provide a separate internal/testing-only constructor to keep the public API cleaner.
    explicit MMapAllocator(
        std::filesystem::path base_path = std::filesystem::temp_directory_path(),
        MMapAccessHint access_hint = MMapAccessHint::Normal,
        detail::MMapAllocationRegistry& allocation_resource =
            detail::MMapAllocationRegistry::instance()
    )

Comment thread include/svs/core/allocator.h Outdated
@rfsaliev rfsaliev force-pushed the rfsaliev/allocator-mmap branch from a751513 to 5f4ac05 Compare May 21, 2026 16:27
@rfsaliev rfsaliev force-pushed the rfsaliev/allocator-mmap branch from 5f4ac05 to 66b6817 Compare May 26, 2026 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants