feat(jobs): Add data retention jobs#4128
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
|
@TheodoreSpeaks let's consolidate the migrations into a single one, just delete the existing ones and run it once over all the changes in shcema.ts |
57194bd to
a3c1bab
Compare
a3c1bab to
63f7b59
Compare
|
@BugBot review |
PR SummaryHigh Risk Overview Replaces the previous inline Introduces an Enterprise-only Data Retention settings section and API ( Reviewed by Cursor Bugbot for commit 990d56a. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
@BugBot review |
0cb8970 to
9cb8dba
Compare
Greptile SummaryAdds three data retention background jobs (soft-delete cleanup, log cleanup, task/chat cleanup) dispatched via Trigger.dev or an inline fallback, with an enterprise-gated UI and API for per-workspace configuration. The migration replaces full soft-delete indexes with partial indexes and adds three retention columns to
Confidence Score: 3/5Not safe to merge as-is: the S3 cleanup gap will permanently orphan workspace_file objects in object storage on every cleanup run. One confirmed P1 data-integrity bug (workspace_file S3 objects never deleted) that will silently accumulate orphaned cloud storage objects on each cron execution. Everything else — batching logic, auth, migration, Trigger.dev wiring, enterprise UI — is well-structured. apps/sim/background/cleanup-soft-deletes.ts — cleanupWorkspaceFileStorage must also cover the workspaceFile (singular) table Important Files Changed
Sequence DiagramsequenceDiagram
participant Cron as Cron (GET /api/cron/*)
participant Dispatcher as dispatchCleanupJobs
participant Queue as JobQueue (Trigger.dev / DB)
participant Task as Background Task
participant DB as Database
participant S3 as Object Storage
participant Copilot as Copilot Backend
Cron->>Dispatcher: dispatchCleanupJobs(jobType, retentionColumn)
Dispatcher->>Queue: enqueue free-tier job
Dispatcher->>Queue: enqueue paid-tier job
Dispatcher->>DB: query enterprise workspaces with non-NULL retention
Dispatcher->>Queue: batchTrigger enterprise jobs
Queue->>Task: run(payload)
Task->>DB: resolveTierWorkspaceIds or lookup workspace retention
Task->>DB: SELECT expiring rows (batched, LIMIT 2000)
Task->>S3: delete associated files (pre-deletion)
Task->>Copilot: POST /api/tasks/cleanup (chat IDs)
Task->>DB: DELETE rows by ID
Task-->>Queue: complete
Reviews (2): Last reviewed commit: "fix lint" | Re-trigger Greptile |
|
@greptile review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 990d56a. Configure here.
| if (workspaceIds.length === 0) { | ||
| logger.info(`[${label}] No workspaces to process`) | ||
| return | ||
| } |
There was a problem hiding this comment.
Snapshot cleanup skipped when no free workspaces exist
Medium Severity
The early return when workspaceIds.length === 0 prevents cleanupOrphanedSnapshots from ever running if no free-tier workspaces exist. Since snapshot cleanup is a global operation (deletes orphaned snapshots across all workspaces), it's independent of whether there are free workspaces to process. In a deployment where all users have paid plans, orphaned snapshots would accumulate indefinitely. The old code in the logs cleanup route always ran snapshot cleanup regardless of workspace count — this refactoring introduced a regression.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 990d56a. Configure here.
| ) | ||
| ) | ||
| .where(and(isNull(workspace.archivedAt), isNotNull(retentionCol))) | ||
|
|
There was a problem hiding this comment.
Missing DISTINCT causes duplicate enterprise cleanup jobs
Low Severity
The enterprise workspace query uses an INNER JOIN on subscription without DISTINCT. If a billedAccountUserId has multiple matching subscription rows (e.g., one active and one past_due, both included in ENTITLED_SUBSCRIPTION_STATUSES), the same workspace ID appears multiple times. Each duplicate triggers a separate cleanup job, causing redundant concurrent deletes against the same workspace data. The same issue exists in resolveWorkspaceIdsForPlan for pro/team plans, though there duplicates in an inArray clause are harmless.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 990d56a. Configure here.


Summary
Add data retention jobs. 3 jobs created:
Type of Change
Testing
Checklist
Screenshots/Videos