feat: infer memberOrganization stint dates from work-email activities (CM-1105)#4054
feat: infer memberOrganization stint dates from work-email activities (CM-1105)#4054
Conversation
There was a problem hiding this comment.
Pull request overview
Adds infrastructure to infer and persist memberOrganizations stint dates from verified work-email activities, so email-domain affiliations become timeline-aware and can compete with enrichment on overlaps.
Changes:
- Extend affiliation resolution to bias toward email-domain rows when a verified email domain is present, and add a source-priority tier in
decidePrimaryOrganizationId. - Buffer
(memberId, orgId, YYYY-MM-DD)activity evidence in Redis on the ingestion hot path and introduce a cron job to infer stint insert/update operations from buffered dates. - Add a partial Postgres index to speed up per-member fetches of
email-domainmemberOrganizations; remove legacy mapping scripts and rename the shared member-organization service file.
Reviewed changes
Copilot reviewed 14 out of 16 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| services/libs/types/src/organizations.ts | Adds shared types for buffered org-dates and inferred stint changes. |
| services/libs/data-access-layer/src/old/apps/data_sink_worker/repo/memberAffiliation.data.ts | Extends work-experience data shape to include source. |
| services/libs/data-access-layer/src/members/segments.ts | Adds optional email-domain candidate inclusion in findMemberWorkExperience. |
| services/libs/data-access-layer/src/members/organizations.ts | Adds fetchMemberOrganizationsBySource for cron’s targeted reads. |
| services/libs/common_services/src/services/memberOrganization.ts | Deleted (renamed). |
| services/libs/common_services/src/services/member/unmerge.ts | Updates import to new member-organization module path. |
| services/libs/common_services/src/services/member-organization.ts | New module: keeps unmerge helpers and adds stint inference logic + Redis key constants. |
| services/libs/common_services/src/services/index.ts | Re-exports renamed member-organization module. |
| services/libs/common_services/src/services/common.member.service.ts | Threads emailDomain through findAffiliation and adds source-priority selection logic. |
| services/apps/data_sink_worker/src/service/member.service.ts | Buffers per-member per-org activity dates in Redis and enqueues member IDs for cron. |
| services/apps/data_sink_worker/src/service/activity.service.ts | Extracts verified email domain from activity payload and passes it into affiliation lookup. |
| services/apps/data_sink_worker/src/bin/map-tenant-members-to-org.ts | Removed outdated script. |
| services/apps/data_sink_worker/src/bin/map-member-to-org.ts | Removed outdated script. |
| services/apps/data_sink_worker/package.json | Removes script entries for deleted bin scripts. |
| services/apps/cron_service/src/jobs/inferMemberOrganizationStintChanges.job.ts | New cron job to drain Redis buffers and compute stint changes (currently dry-run). |
| backend/src/database/migrations/V1776931245__member-organizations-email-domain-partial-index.sql | Adds partial index to support efficient per-member email-domain org reads. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 16 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 16 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 16 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d07c84e to
4f9f391
Compare
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
…and reuse Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
4f9f391 to
68a79fc
Compare
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8a9f500. Configure here.
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
… in member organization job Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
Signed-off-by: Yeganathan S <63534555+skwowet@users.noreply.github.com>
| "private": true, | ||
| "scripts": { | ||
| "start": "SERVICE=cron-service tsx src/main.ts", | ||
| "start": "SERVICE=cron-service LOG_LEVEL=trace tsx src/main.ts", |
There was a problem hiding this comment.
Remove log level trace here.
| "private": true, | ||
| "scripts": { | ||
| "start": "SERVICE=data-sink-worker tsx src/main.ts", | ||
| "start": "SERVICE=data-sink-worker LOG_LEVEL=trace tsx src/main.ts", |
There was a problem hiding this comment.
Remove log level trace here.
| } | ||
|
|
||
| // 1. Fetch current dates for this specific organization | ||
| const existing = await this.redisClient.hGet(key, organizationId) |
There was a problem hiding this comment.
The buffer uses non-atomic hGet → JSON parse → push → hSet. Comment acknowledges "occasionally drop a date." With many data_sink_worker pods writing for an active maintainer, this won't be occasional — concurrent same-org writes on the same
member will lose every-loser-wins.
Worse, the cron job then does: hGetAll → compute → DB writes → hDel(datesKey, orgIds). Any date buffered between the hGetAll and the hDel is silently destroyed because hDel removes the entire field, not just the values we read. So while the cron
is running rule evaluation + Postgres writes (potentially seconds), every concurrent activity arriving for that member is at risk.
Fix: store dates as a Redis set keyed per (member, org) — SADD key:: is naturally idempotent and concurrency-safe. Use SMEMBERS then SREM with the exact members read (or DEL only after SDIFF confirms no new entries). Or
use a Lua script for read-and-delete atomicity.
The "self-healing because future activity will re-populate" reasoning only holds for active orgs. For a member who switches jobs and stops generating activity for the old org, the lost date is the data point.
|
|
||
| // Keep only candidates from the highest-priority source tier | ||
| if (highestPrioritySourceExperiences.length === 1) { | ||
| return highestPrioritySourceExperiences[0].organizationId |
There was a problem hiding this comment.
if (highestPrioritySourceExperiences.length === 1) {
return highestPrioritySourceExperiences[0].organizationId
}
const memberCounts = await findMemberCountEstimateOfOrganizations(
this.qx,
highestPrioritySourceExperiences.map((e) => e.organizationId),
)
if (memberCounts[0].memberCount > memberCounts[1].memberCount)
memberCounts[0]/[1] is hardcoded for 2 candidates. If three+ experiences land in the same priority tier (very plausible: three enrichment-* sources), only the first two are compared and the rest are silently ignored. This bug exists in the
original code too, but the new tier filter changes which set is passed, so the behavior shifts.

Context
When an activity comes in with a verified work email like
jbeulich@suse.com, we already create amemberOrganizationsrow linking the member to SUSE — but with NULLdateStart/dateEnd. That causes two problems:@suse.comactivity can lose to an unrelated dated enrichment row, becausefindAffiliationtreats undated rows as last-resort fallback.We want to use the activity timestamp as evidence that the person was at that company at that moment, and write that into
dateStart/dateEnd. The catch: we can't write on every activity (active maintainers generate hundreds per day), we can't collapse a real multi-stint history like Google → Apple → Google into one wrong range, and we can't override user edits or enrichment data.How it works
Hot path (
data_sink_worker): When an activity arrives, we just buffer(org, date)in a Redis hash keyed by member. This involves two Redis ops, no Postgres, and no rule evaluation. Hundreds of same-day activities collapse to a single entry.Cron (
cron_service): Every 5 min, the service pops up to 500 pending members, atomically drains each member's hash, loads their existing email-domain rows, and walks the buffered dates chronologically applying 4 rules:dateEnd→ extend forward, with a 30-day debounce and a multi-stint guard (if another org holds a 30+ day stint in the gap, insert a fresh stint instead of bridging)dateStart→ extend backward, same multi-stint guard, no debounce (rare, re-ingestion only)Walking all orgs together in chronological order is what makes multi-stint detection work: by the time the 2008 Google event checks its gap, the 2005-2007 Apple events are already in the working copy and the guard fires correctly.
findAffiliationgets two changes so email-domain rows start contributing meaningfully:ui > email-domain > enrichment-* > other) inserted intodecidePrimaryOrganizationId. Once email-domain rows have dates, they beat enrichment on overlaps.findMemberWorkExperiencepulls in the matching email-domain row as a candidate even when undated. This is the user-visible win that lands immediately — work-email activities resolve to the right org inline, without waiting for the cron to stamp dates.Partial index on
memberOrganizations("memberId") WHERE source='email-domain' AND deletedAt IS NULLbacks the cron's per-member fetch so it's a single index seek.Cleanup
map-member-to-org.ts,map-tenant-members-to-org.ts).memberOrganization.ts→member-organization.tsto match the folder's kebab-case convention.Note
Medium Risk
Touches organization affiliation selection and adds automated background writes to
memberOrganizations, so incorrect inference logic or query changes could affect org attribution/timelines. Risk is moderated by scoping changes toemail-domainsources and batching work through Redis + cron.Overview
Infers
memberOrganizationsstintdateStart/dateEndfor email-domain affiliations using activity timestamps: the data sink worker buffers per-member per-org activity dates in Redis, and a new cron job runs every 5 minutes to compute and apply inserts/updates to email-domain rows.Affiliation resolution is adjusted to (a) prefer higher-priority organization sources when overlaps exist (via
getMemberOrganizationSourceRank) and (b) biasfindAffiliationtoward an email-domain work experience candidate matching the activity’s verified email domain, even if the stint is currently undated.Adds supporting DAL/types (
fetchMemberOrganizationsBySource,MemberOrgStintChange/MemberOrgDate) and a Postgres partial index onmemberOrganizations(memberId)forsource='email-domain'to speed the cron’s per-member fetch; also removes two obsolete mapping scripts, renamesmemberOrganizationutilities tomember-organization, and bumps default service start scripts toLOG_LEVEL=trace.Reviewed by Cursor Bugbot for commit 498f0c1. Bugbot is set up for automated code reviews on this repo. Configure here.