Add durable background jobs for memory and scanner work by massy-o · Pull Request #180 · XortexAI/XMem

massy-o · 2026-05-14T03:54:54Z

Refs #162

Summary

add a MongoDB-backed durable job queue with idempotency keys, retries, timeouts, stale lease recovery, and dead-letter records
enqueue /v1/memory/ingest and /v1/memory/batch-ingest work and expose /v1/jobs/{job_id} for job status/results
route scanner start/resume work through the durable queue while preserving the existing scanner job/status records and falling back to in-process tasks if the job store is unavailable
add job worker settings for polling, timeout, retry, backoff, and lease duration

This is a focused Phase 1 implementation for the task-queue/status foundation described in the issue discussion.

Validation

python3 -m py_compile src/jobs.py src/api/routes/jobs.py src/api/routes/memory.py src/api/routes/scanner.py src/api/app.py src/api/schemas.py
git diff --check
uv run --with pytest --with pytest-asyncio --with fastapi --with pydantic --with pydantic-settings --with python-jose --with pymongo --with httpx --with beautifulsoup4 pytest tests/api/test_dependencies_and_routes.py -q -> 4 passed

gemini-code-assist

Code Review

This pull request introduces a durable background job system using MongoDB to handle long-running tasks in the memory and scanner modules. It adds a new job status endpoint, a worker for asynchronous execution, and relevant configuration settings. The review feedback primarily highlights the need to offload synchronous database operations to worker threads using asyncio.to_thread to avoid blocking the FastAPI event loop. Additionally, improvements were suggested for the stability of idempotency keys, the completeness of dead-letter logs, and the refactoring of duplicated user identification logic.

ishaanxgupta · 2026-05-16T13:33:41Z

Hi @massy-o , thank you for the contribution. the PR looks good to me, mostly I am concerned that if we need to make the changes in the /v1/memory routes or we could upgrade the versioning and make the changes in /v2/memory routes leaving the /v1/memory as it is. What do you think on this? Also let me know your thoughts on celery & redis, did you try out the ingest endpoint after job tracking improvement and notice the latency? has that increased?

massy-o · 2026-05-16T13:39:45Z

Thanks @ishaanxgupta, that is a fair concern.

On the API versioning question: I agree that changing the response contract of /v1/memory/ingest is the riskiest part of this PR. Since the durable job path returns an enqueue/status response instead of the previous synchronous ingest result, my preference would be to keep /v1/memory backward-compatible and expose the async/job-tracked behavior under /v2/memory (or behind an explicit opt-in flag/header if you prefer a smaller surface). I am happy to adjust the PR in that direction so existing /v1 clients do not see a surprise contract change.

On Celery + Redis: I think that is a good production direction, especially once we want multiple worker processes, clearer operational controls, scheduling, and mature retry/dead-letter behavior. I kept this PR on the existing MongoDB dependency to make the first step smaller and avoid introducing Redis/Celery as new required infrastructure. The job store/worker boundary should also make it possible to swap the backend later without changing the route-level API much.

On ingest latency: I have not run a production-like benchmark yet, so I do not want to overstate the numbers. The intended effect is that the request path only persists the job record and returns the status URL, while the expensive ingest pipeline runs out of band. So the interactive request latency should generally decrease versus synchronous ingest, with the tradeoff that completion is now observed via polling. There is a small extra cost for the Mongo job insert/status tracking, but that should be much smaller than the embedding/judge/weaver work. If useful, I can add a lightweight timing note/test or run a before/after local measurement as part of the PR update.

So my proposed next step is move the async job-tracked ingest behavior to /v2/memory, keep /v1/memory synchronous/backward-compatible, and leave the current Mongo-backed queue as the minimal backend unless you want this PR to switch directly to Celery/Redis.

ishaanxgupta · 2026-05-16T13:51:22Z

@massy-o yes that would be great, lets bring about these changes in /v2/memory and lets keep /v1/memory as it was. We can do the celery or redis integration in the next PR.
Could you please do a before/after local measurement and just share the results here in the comments itself.

Add durable memory ingest jobs

f10e96b

massy-o requested review from ishaanxgupta and ved015 as code owners May 14, 2026 03:54

massy-o mentioned this pull request May 14, 2026

Move ingestion and scanner work to durable jobs/workflows #162

Open

github-actions Bot added config api labels May 14, 2026

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

Address durable job review feedback

eef4603

ishaanxgupta requested a review from Ankit-Kotnala May 16, 2026 13:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add durable background jobs for memory and scanner work#180

Add durable background jobs for memory and scanner work#180
massy-o wants to merge 2 commits into
XortexAI:mainfrom
massy-o:durable-memory-jobs

massy-o commented May 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ishaanxgupta commented May 16, 2026

Uh oh!

massy-o commented May 16, 2026 •

edited

Loading

Uh oh!

ishaanxgupta commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

massy-o commented May 14, 2026

Summary

Validation

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ishaanxgupta commented May 16, 2026

Uh oh!

massy-o commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ishaanxgupta commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

massy-o commented May 16, 2026 •

edited

Loading