Updated aliases#2326
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
Fresh review of everything, usual caveats, assume we need sapiens from here, but very close to ready, issues seems very minor and nitpicky: PR Review — Process aliases (OTP ≥ 26 semantics)Scope: last 27 commits ( Reviewed: full diff plus Overall this is a careful, well-commented implementation with good test Severity summary
1. Major —
|
Implement Erlang process aliases together with the reference-representation
rework they require, rebased onto release-0.7 and squashed into one commit.
Public API (libs/estdlib/src/erlang.erl):
- alias/0, unalias/1
- monitor/3 with {alias, explicit_unalias | demonitor | reply_demonitor}
- spawn_opt {monitor, [monitor_option()]}
Runtime:
- New process-reference term carrying ref_ticks + owner process id (RefData),
alongside short/local references; BEAM-compliant reference ordering and
binary_to_term/term_to_binary round-tripping (term.{c,h}, external_term.c).
- Monitors store RefData; alias monitors (CONTEXT_MONITOR_ALIAS) with the three
alias modes and their lifecycle (context.{c,h}).
- send/2, the send opcode and the JIT send path resolve an alias to its target
process and drop reply_demonitor aliases after first use (nifs.c,
opcodesswitch.h, jit.c).
- Reference size macros: REF_SIZE -> TERM_BOXED_REFERENCE_SHORT_SIZE, plus
TERM_BOXED_REFERENCE_PROCESS_SIZE and TERM_BOXED_REFERENCE_RESOURCE_SIZE.
Tests: alias coverage added to test_monitor, test_refs_ordering and
test_binary_to_term.
Sending to a process reference resolved and deactivated the alias from the
sending process: it walked and mutated the target's monitor list
(context_find_alias / context_unalias) while holding only the processes_table
read lock. That lock only guards against the target being freed, not against
the target mutating its own monitor list (via alias/0, unalias/1, demonitor or
process teardown), so a send racing with a monitor change could corrupt the
list or free a node used by another scheduler -- a data race / use-after-free
on SMP.
Route alias sends through the mailbox instead, the same way monitor and
demonitor operations are already delivered across processes. The send paths now
post an AliasMessageSignal carrying {Ref, Message} via the shared helper
globalcontext_send_message_to_alias; the owner validates the alias against its
own monitor list when it drains signals (context_process_alias_message_signal)
and either delivers the message or drops it for an inactive alias. All
monitor-list access now happens in the owning process, and the three send paths
(erlang:send/2, the send opcode and the JIT send) share one helper.
Signed-off-by: Davide Bettio <davide@uninstall.it>
A monitor created with {alias, reply_demonitor} previously only deactivated the
alias on the first reply through it; the underlying monitor stayed active, so
the owner could still receive a stale 'DOWN'. Match OTP: on the first message
received through the alias, also remove the monitor and signal the monitored
process to drop its side, as erlang:demonitor/1 would.
Add a regression test asserting that, after a reply through a reply_demonitor
alias, no 'DOWN' is delivered when the monitored process later exits.
Signed-off-by: Davide Bettio <davide@uninstall.it>
Aliases require OTP 24, so the alias tests guarded their alias-specific cases behind a runtime OTP-version check (plus a make_ref-based fallback for older BEAM). AtomVM no longer supports OTP < 26, where aliases are always available, so those checks are dead code: run the alias cases unconditionally and drop the is_atomvm_or_otp_version_at_least/1 helper and the mock-alias fallback. Signed-off-by: Davide Bettio <davide@uninstall.it>
For {alias, demonitor} and {alias, reply_demonitor}, the alias is deactivated
when the monitor is removed, including the automatic removal when a 'DOWN' is
delivered (as erlang:demonitor/1 already does for an explicit demonitor).
context_process_monitor_down_signal removed the monitor but left the alias
active, so a third party could still reach the owner through the alias after
the monitored process had exited.
Deactivate the alias (unless it is explicit_unalias) in both the local and
registered-name 'DOWN' paths, and add a regression test.
Signed-off-by: Davide Bettio <davide@uninstall.it>
nif_erlang_monitor locks the target process, then raises badarg when the object type does not match the target (monitor(process, Port, ...) or vice versa). That error path returned without releasing the processes-table lock, unlike the three sibling out-of-memory error paths in the same function, leaking the lock and deadlocking the next writer on SMP builds. Unlock the target before raising. (A behavioural test would deadlock rather than fail; verified by inspection against the sibling error paths.) Signed-off-by: Davide Bettio <davide@uninstall.it>
Both nif_erlang_monitor and do_spawn published the monitor/alias state (queuing a MonitorSignal to the target and adding monitors to the contexts) and only then reserved heap for the returned reference. An out-of-memory at that last step raised, leaving the monitor half-installed: the target could later send a 'DOWN' for a reference the caller never received. Reserve the result space before publishing, freeing the monitor structs on failure. GC at the new point is safe -- the monitor structs hold only an immediate pid and a plain RefData, none reachable from the heap. Unwinding the separate spawn_opt link side effect on a later OOM is left out of scope. Signed-off-by: Davide Bettio <davide@uninstall.it>
An alias send is delivered via an AliasMessageSignal so the owner validates the alias in its own context (race-free). But mailbox_process_outer_list moved normal messages straight to the inner mailbox while signals were processed afterwards and re-posted, so `Alias ! m1, Pid ! m2` to the same receiver could deliver m2 before m1 -- violating the single-sender message ordering guarantee. Convert alias messages to normal messages in place during the outer-list reverse (new mailbox_process_outer_list_with_aliases), prepending them into the normal stream so they keep send order relative to plain messages from the same sender. context_process_alias_message_signal now returns the payload instead of re-posting it. Teardown and the port scheduler keep the plain mailbox_process_outer_list (which returns alias signals to drop): they must not run the reply_demonitor path, which takes the processes-table lock that context_destroy already holds. Add a regression test (alias send then pid send keep order). Signed-off-by: Davide Bettio <davide@uninstall.it>
term_compare sizes a process reference as 3 words (process_id + ref_ticks) in the local-vs-local branch, but the mixed local-vs-external branch still sized it as 2 words, dropping process_id. A process reference was therefore ordered differently depending on whether it was compared against a local or an external reference, which can make the term order inconsistent. (The reviewer's "two process refs with equal ref_ticks" case cannot actually occur: ref_ticks come from a global monotonic counter, so they are unique.) Size process references as 3 words in the mixed branch too and add the matching extraction, mirroring the local-vs-local logic. Exercising this path requires an external reference (distribution), so it is verified by inspection. Signed-off-by: Davide Bettio <davide@uninstall.it>
binary_to_term of a NEWER_REFERENCE_EXT with len == 3 turned the third word straight into a process reference id without bounding it. Note this was not a memory-safety issue as the reviewer suggested: the id flows into globalcontext_get_process_lock, which performs a linear search of the process table (no out-of-bounds indexing), so an out-of-range value simply matches no process. Still, validate it like a local pid (TERM_MAX_LOCAL_PROCESS_ID) to reject clearly malformed input early. Signed-off-by: Davide Bettio <davide@uninstall.it>
monitor(process, X, [{alias, ...}]) returned a plain short reference (no alias)
when X was self() or an already-dead process, so the returned reference could not
be used to send to the process or be passed to unalias/1. OTP returns a usable
alias in both cases.
Install the alias and return a process reference on both early-return paths. For
self the alias stays active (self never sends a DOWN). On the noproc path the
monitor is immediately removed by the noproc DOWN, so only an explicit_unalias
alias stays active (demonitor / reply_demonitor are deactivated right away, as at
a real DOWN). Add regression tests for both.
Signed-off-by: Davide Bettio <davide@uninstall.it>
A send to a reference that is not an active local process-reference alias (a plain/short ref, a resource ref, or an external ref) is silently dropped, which matches OTP for local references. Distributed aliases (external references) are not supported, so a message addressed to a remote alias is lost rather than routed over distribution. Add comments at the three send sites (erlang:send/2, the send opcode and the JIT send); no behaviour change. Signed-off-by: Davide Bettio <davide@uninstall.it>
On 32-bit, TERM_BOXED_REFERENCE_RESOURCE_SIZE is 5 (one more than header + refc + mso cons) so reference shapes stay size-distinguishable, but term_from_resource only wrote 4 words, leaving the trailing padding word uninitialized. It is not read by comparison or serialization, but it is copied during GC, which is an uninitialized-memory read (sanitizer / valgrind). Initialize it to nil. No effect on 64-bit, where the resource size is exactly 4 words. Signed-off-by: Davide Bettio <davide@uninstall.it>
…onitor The CONTEXT_MONITOR_RESOURCE case fell through to the following monitor cases when the ref_ticks did not match. It is benign today (the next cases only break), but the alias commit added CONTEXT_MONITOR_ALIAS to the same switch, so tighten the fall-through. No behaviour change. Signed-off-by: Davide Bettio <davide@uninstall.it>
globalcontext_init_process appends every new context to the scheduler waiting queue, but context_destroy never removed it. Destroying a process that was never scheduled -- as the spawn NIFs do on option errors, or port drivers on initialization failures -- left a dangling queue entry pointing into freed memory, corrupting the queue on the next scheduler operation. Pre-existing upstream; exposed by the spawn_opt atomicity test added in the next commit, the first to exercise such a path. Dequeue in context_destroy under the processes spinlock, and reset the queue item at the scheduler's own dequeue sites so the destroy-time dequeue is a no-op there. A concurrent re-queue cannot happen: scheduler_make_ready refuses Killed and Spawning processes under the same spinlock, and one of these flags is set on every destroy path. Signed-off-by: Davide Bettio <davide@uninstall.it>
do_spawn published the link before the monitor phase, which can still fail parsing the
monitor options or on out-of-memory, including the result-space reservation. The error
paths destroyed the new context but left the caller's link entry installed, so the
destroyed context's teardown sent the caller a spurious {'EXIT', Pid, normal} -- delivered
as a message when trapping exits -- and process_info(links) reported a link to a dead pid,
for a spawn_opt call that raised. OTP's spawn_opt is atomic: either it returns and the
link exists, or it raises without side effects.
Run every fallible step -- option parsing, link and monitor allocations, and the result
space reservation -- before publishing anything; publishing itself cannot fail. This
closes the unwind residual left out of scope by the result-space commit. Add a regression
test checking that spawn_opt(F, [link, {monitor, [Bad]}]) raises badarg without leaving a
link behind or delivering an 'EXIT' message.
Signed-off-by: Davide Bettio <davide@uninstall.it>
The BEAM conformance CI run (test-erlang -b) showed that on OTP,
monitor(process, self(), [{alias, _}]) is a no-op apart from returning a fresh reference:
no monitor is installed and no alias either -- sends to the returned reference are
dropped, and unalias/1 and demonitor(Ref, [info]) return false (verified on OTP 29).
This reverts the self half of "Return a real alias from monitor/3 on the self and noproc
paths", whose premise was wrong for self; the noproc half is conformant (its test passes
on the BEAM). Return a plain short reference and install nothing, matching the existing
monitor/2 self path, and update the test to assert the OTP behaviour.
Signed-off-by: Davide Bettio <davide@uninstall.it>
On the BEAM the test process is the erl_eval process, which is linked to init, so the
{links, []} assertion failed when running the tests with test-erlang -b. Snapshot the
links before the spawn_opt call and assert they are unchanged after; this still catches
a leaked link on AtomVM and is independent of the harness process's pre-existing links.
Signed-off-by: Davide Bettio <davide@uninstall.it>
The monitor_*_new constructors return a pointer to the struct Monitor embedded in their container struct, and the nifs.c error paths free()d that pointer directly. That is only correct because the monitor happens to be the first member of every container struct; recover the container with CONTAINER_OF instead, through a new monitor_destroy that switches on the monitor type, so a layout change cannot silently corrupt the heap. No functional change. Signed-off-by: Davide Bettio <davide@uninstall.it>
The RefData arguments of the monitor constructors and term_from_ref_data are only read, so take them as const pointers per the coding style. Document monitor_alias_new, term_from_ref_data and term_process_ref_to_process_id like their neighbours, and use the until-now unused REFERENCE_PROCESS_PID_OFFSET for the process id word of a process reference, which also drops a TERM_BYTES conditional. No functional change. Signed-off-by: Davide Bettio <davide@uninstall.it>
erlang:send/2 accepts process alias references since the aliases were
introduced, but send_destination() only allowed pids, ports and atoms.
Add reference() like OTP does, document the alias destination (and the
silent drop of non-alias references) on send/2, and mention the
{monitor, MonitorOpts} option and the {Pid, MonitorRef} result in the
spawn_opt docs.
Signed-off-by: Davide Bettio <davide@uninstall.it>
Signed-off-by: Davide Bettio <davide@uninstall.it>
Nothing ever referenced it: reference flavors are distinguished by their boxed size (see the comment above TERM_BOXED_REFERENCE_SHORT_SIZE), so the enum was dead code. It also didn't follow the coding style, which wants enumerations typedef'd with a lower_snake_case _t name. Signed-off-by: Davide Bettio <davide@uninstall.it>
The alias feature commit accidentally dropped the space in "// TODO" while sweeping REF_SIZE through network_driver.c. Restore the original comment; the line is unrelated to process aliases and clang-format flags the new form. Signed-off-by: Davide Bettio <davide@uninstall.it>
unalias/1 validated its argument with term_is_local_reference, raising badarg for a reference from another node. OTP 29 returns false instead (an external reference can never be an alias of the calling process); only non-references raise badarg. Sends to external references are already silently dropped, like OTP when the node is not alive; the new test pins both behaviours, BEAM-validated. Signed-off-by: Davide Bettio <davide@uninstall.it>
The console driver rebuilt the io_reply reference from the request reference ticks. That erases the flavor of a process reference: an alias passed as ReplyAs came back as a short reference with the same ticks, which the requester's receive does not match, so an io request through an alias hung. It also crashed on a non-reference ReplyAs (the io protocol allows any term) by reading ref ticks from an arbitrary term. Put the incoming ReplyAs term in the reply instead: it lives in the request message storage until the handler returns and port_send_message copies it. This also drops a reference allocation from every console reply. Signed-off-by: Davide Bettio <davide@uninstall.it>
The select machinery (socket nowait handles, enif_select refs used by the posix nifs) stores only the reference ticks and rebuilds the handle as a plain reference in the select notification. An alias passed as a handle would therefore come back as a reference the alias does not match, and the caller would hang waiting for the notification. On the BEAM an alias is a valid select handle and is echoed back verbatim. Supporting that would mean carrying the owner process id through the select machinery for a corner case with no practical use, so raise badarg instead and keep the code simple; the divergence is called out in comments at both validation sites. Signed-off-by: Davide Bettio <davide@uninstall.it>
The alias tests used a single 500 ms recv_one window both for positive receives (the file convention elsewhere is 5000 ms) and for asserting that a message does NOT arrive. Under valgrind or a loaded CI runner the positive uses are the most likely false failure, and the two negative uses were detection windows a reintroduced bug could outlast. Raise recv_one to 5000 ms and assert the negatives behind deterministic fences instead: a later monitor whose DOWN fires after the buggy one (monitors fire in installation order), or a fence reply through the same echo process (sends from one process keep their order). Bound the one unbounded receive, and document that the same-batch DOWN test exercises the same-batch path reliably only on SMP builds. Signed-off-by: Davide Bettio <davide@uninstall.it>
Quit the echo workers the alias tests spawn (the file convention; harmless today only because each module runs in a fresh VM), rewrap a comment that exceeded the 100-column print width, and restore the blank line between two test functions in test_binary_to_term. Signed-off-by: Davide Bettio <davide@uninstall.it>
The helper shadowed the auto-imported spawn_monitor/2 BIF and made erlc emit seven "ambiguous call of overridden auto-imported BIF" warnings on every build of the test sources. Resolution picked the local function, so behavior was correct, but the warnings were noise this branch introduced. Signed-off-by: Davide Bettio <davide@uninstall.it>
The demonitor mode also deactivates on DOWN delivery (not only on
demonitor/1), and reply_demonitor removes the monitor as well as the alias
when the first message sent via the alias is delivered (verified against
OTP 29). Also note that the OTP {tag, Term} option raises unsupported.
Signed-off-by: Davide Bettio <davide@uninstall.it>
Two conformance cells the alias tests did not pin down yet: an alias used as
a map and ets key (ordinary reference term handling, value comparison), and
demonitor(Mon, [flush]) on an {alias, demonitor} monitor, where the alias
died with the monitor at DOWN delivery and flush removes the queued DOWN.
Both BEAM-validated.
Signed-off-by: Davide Bettio <davide@uninstall.it>
context_find_alias walks the whole monitor list and runs for every DOWN delivery (twice: once in the outer-list split for same-batch ordering, once in the down handler) and for every demonitor -- also for the vast majority of processes that never create an alias. In a DOWN-heavy benchmark (50 batches of 1000 monitored process exits) this cost 25% over the pre-alias baseline. Track the number of active aliases in a 16-bit slice of the existing Context bitfield word (no struct growth, owner-written only like the surrounding bits) and short-circuit context_find_alias when it is zero. The benchmark returns to the baseline noise envelope; processes that do use aliases keep the previous behavior. Signed-off-by: Davide Bettio <davide@uninstall.it>
Pin two more behaviors verified against OTP 29: when {alias, _} appears twice
in the monitor options the last occurrence wins, and monitoring the caller
through its own registered name installs nothing, exactly like monitoring
self() directly.
Signed-off-by: Davide Bettio <davide@uninstall.it>
alias/0 becomes alias([]) with explicit_unalias as the default mode. The reply option needs no new machinery: it maps onto the reply_demonitor alias type, whose conversion path deactivates the alias when the first message sent via it is delivered -- with no monitor to remove that is exactly the OTP reply behavior (verified against OTP 29, including that a second same-batch message is dropped). Bad options and a non-list argument raise badarg before any side effect. Signed-off-by: Davide Bettio <davide@uninstall.it>
The JIT send sets x0 to the message in every delivering branch, but the fall-through that silently drops a send to a non-alias reference left x0 holding the recipient, so Ref ! Msg returned the reference instead of Msg on the JIT flavor (the interpreter assigns x0 after the dispatch and was correct). Caught by the new non-local-refs send test, which only ran on the interpreter flavor until now. Signed-off-by: Davide Bettio <davide@uninstall.it>
Reference kinds are distinguished by their boxed size, and on 32-bit the natural process reference size (header + 2 ticks words + pid = 4) collides with the resource reference. Resolving the collision by growing the resource reference put the extra word on the wrong side: resource references are used all over the embedded targets (sockets, files, NIF resources), while process references only exist when aliases are used. Move the padding word to the process reference, so 32-bit resource references return to 4 words and only alias users pay the extra word. The padding word is initialized to nil so GC copies a defined value. The size static asserts now check pairwise distinctness, since the sizes are no longer monotonically ordered (32-bit: short 3, resource 4, process 5; 64-bit: short 2, process 3, resource 4). The wire format is unchanged (the padding word is not serialized), and all allocation sites size through TERM_BOXED_REFERENCE_PROCESS_SIZE. Signed-off-by: Davide Bettio <davide@uninstall.it>
…of RefData Storing a RefData (ref ticks + owner pid) in every local monitor grew each MonitorLocalMonitor, MonitorLocalRegisteredNameMonitor and MonitorAlias by 8 bytes (uint64_t alignment), paid by every plain monitor/2 -- i.e. every gen_server:call -- on the embedded targets. The pid was redundant everywhere: on the monitored side it duplicates monitor_obj (the monitoring process, which is exactly the alias owner), on the monitoring, registered-name and alias entries it was never read. Keep a plain uint64_t ref_ticks in the monitor structs (their baseline layout) and carry the only new bit of information -- "the 'DOWN' reference must be rebuilt alias-shaped" -- as a monitor type, CONTEXT_MONITOR_MONITORED_LOCAL_ALIAS. The 'DOWN' path derives the process reference pid from monitor_obj. monitor_new now takes the monitor type directly, replacing the is_monitoring bool as well. RefData stays as the NIF-local value type used to build result references (term_from_ref_data) in monitor/3, spawn_opt and alias/1. Behavior is unchanged and pinned by the existing tests (test_monitor_down_alias asserts the 'DOWN' carries the very alias reference; process_info(monitored_by) keeps listing alias-shaped monitors). Signed-off-by: Davide Bettio <davide@uninstall.it>
The 16-bit active_alias_count was incremented and decremented unchecked: a process accumulating 65536 simultaneously active aliases would wrap the counter to 0, after which context_find_alias short-circuits and every alias of that process silently stops matching -- messages dropped, unalias/1 returning false, reply_demonitor monitors no longer removed. Silent message loss is the worst failure mode for aliases. Make the counter saturate stickily: at 0xFFFF a new bit, alias_count_saturated, is set and the count is pinned (decrements are skipped). Since the pinned count is never 0, the context_find_alias fast path needs no change -- a saturated process simply always walks its monitor list, degrading the optimization instead of correctness. The bit packs into the existing bitfield word, so sizeof(Context) is unchanged. Not covered by a test: creating 65536 live aliases is infeasible in CI (context_add_monitor's duplicate scan makes it quadratic); the three touched sites (add, unalias, terminate) were reviewed for symmetry instead. Signed-off-by: Davide Bettio <davide@uninstall.it>
Aliases are serialized as 3-word NEWER_REFERENCE_EXT, so a BEAM peer can legitimately hold an AtomVM alias and send to it, emitting DOP_ALIAS_SEND (33) or DOP_ALIAS_SEND_TT (34). Those operations hit the default arm of nif_erlang_dist_ctrl_put_data, which raises badarg in the distribution connection process: a single remote alias send could take the whole connection down instead of, at worst, losing one message. Handle both operations by routing through the existing alias delivery path: decoding our own alias yields a process reference (node and creation match), so the owner pid is in the reference itself and the message is posted with globalcontext_send_message_to_alias, where the owner validates the alias as for a local send. A reference that is not a local process reference (e.g. minted by a previous incarnation of this node) drops the message, exactly like a local send to a non-alias reference. Unlike a pid, the reference is a boxed term, so the control message is kept rooted across the payload decode. The trace token of DOP_ALIAS_SEND_TT is ignored (sequential tracing is not supported). The outbound direction is unchanged: sending to a remote alias from AtomVM is not routed over distribution. test_net_kernel gains test_alias_send_from_beam, which has a real OTP node send through an AtomVM alias (delivery and reference round-trip), then send again after unalias/1 (message dropped, connection survives). The test also passes unchanged on BEAM, pinning the OTP semantics it asserts. Signed-off-by: Davide Bettio <davide@uninstall.it>
Behavior differences go to the differences-with-beam page: sending to a
remote alias is not routed over distribution (the inbound direction works),
and aliases are rejected as select handles where BEAM accepts them. The
unsupported erlang:monitor/3 {tag, Term} option is already stated in the
function documentation, and the builtin port-driver request/reply protocols
are AtomVM-specific, so neither belongs on the differences page.
The alias/1 documentation now states that the OTP 28 priority option is not
supported and raises badarg, and the CHANGELOG records the REF_SIZE C macro
deprecation with its replacements.
Signed-off-by: Davide Bettio <davide@uninstall.it>
The received-order split added for process aliases reverses the LIFO outer list and then walks it a second time, so that alias side effects run in received order (deactivating the alias, and resolving several sends to the same alias in one batch oldest-first). That second traversal only matters when the owner actually holds an active alias, yet it ran for every signal drain of every process -- including the alias-free processes that are the common case on the embedded targets, and the ctx == NULL scheduler and termination callers that can never own an alias. Gate it on active_alias_count: when it is zero (or there is no owner context), split the outer list into the normal and signal sublists in a single prepend pass, exactly as before aliases existed. active_alias_count is written only by the owner and this runs in the owner's own context, so it is stable for the whole batch and the choice is race-free. An AliasMessageSignal reached on this path necessarily targets an inactive alias: drop it when we own a context (it must never reach the signal loop, which treats it as unreachable), or leave it for the ctx == NULL caller to ignore. The full received-order two-pass is kept unchanged for a process that holds an active alias. No behavior change: the fast path produces the same received-order sublists the two-pass produced for an alias-free batch. Signed-off-by: Davide Bettio <davide@uninstall.it>
The alias support reduced the console handler's heap reservation from a conservative 12 terms to TUPLE_SIZE(3), matching the io_reply 3-tuple once the verbatim ReplyAs echo removed the separate ref allocation. But the gen_call paths build their error reply as an error 2-tuple nested inside the reply 2-tuple (port_create_error_tuple + port_send_reply), i.e. TUPLE_SIZE(2) + TUPLE_SIZE(2) = 6 terms, allocated on ctx's heap by helpers that do not run their own memory_ensure_free. Reserving only 4 terms under-allocates by 2: the writes are masked by heap slack today, but violate the ensure_free contract and corrupt the heap whenever it is tight (e.g. after a MEMORY_CAN_SHRINK shrink to the requested size). Reserve 2 * TUPLE_SIZE(2), the largest reply the handler builds (more than the io_reply 3-tuple and the close 2-tuple). Flagged by CodeQL's allocation-exceeding-ensure_free query. Signed-off-by: Davide Bettio <davide@uninstall.it>
|
|
||
| term context_process_alias_message_signal(Context *ctx, struct TermSignal *signal) | ||
| { | ||
| // signal->signal_term is a 2-tuple {Ref, Message}. The alias lookup runs here, in the owner's own |
There was a problem hiding this comment.
Do we want comments that verbose? The drawback is that eventually they end up being unsync with code, and LLM or humans tend to read comments instead of code.
| // Target cannot be NULL as we processed Demonitor signals | ||
| assert(target != NULL); | ||
| int required_terms = REF_SIZE + TUPLE_SIZE(5); | ||
| int required_terms = TERM_BOXED_REFERENCE_PROCESS_SIZE + TUPLE_SIZE(5); |
There was a problem hiding this comment.
This size could be conditional of monitor->monitor_type == CONTEXT_MONITOR_MONITORED_LOCAL_ALIAS, couldn't it?
| // lookups are no longer skipped for this process, trading the optimization for | ||
| // correctness instead of wrapping to 0 and silently deactivating every alias. | ||
| unsigned int active_alias_count : 16; | ||
| unsigned int alias_count_saturated : 1; |
There was a problem hiding this comment.
Not sure this optimization is worth the 17 bits it costs. We could have only one bit and check if there is any alias left when removing an alias. Alternatively, we could reduce it to 8 bits and drop the optimization when the number of aliases reaches (forever) 255.
| { | ||
| Context *p = globalcontext_get_process_lock(glb, process_id); | ||
| if (p) { | ||
| // Post the alias message as a signal carrying {Ref, Message}. The owner validates the alias |
There was a problem hiding this comment.
I don't think this comment adds any value
| * | ||
| * @details Posts an AliasMessageSignal carrying {Ref, Message} to the owner process. The owner | ||
| * validates the alias against its own monitors when it drains signals and either delivers Message | ||
| * as a normal message or drops it. This avoids touching the owner's monitor list from the sender. |
There was a problem hiding this comment.
Last sentence sounds obvious here.
| set_error(ctx, jit_state, 0, BADARG_ATOM); | ||
| return false; | ||
| } else { | ||
| // A reference that is not a local process reference (short/resource/external ref): the |
There was a problem hiding this comment.
A TODO would be welcome for distributed aliases.
Also I would put the term_is_reference first.
| } | ||
|
|
||
| int local_process_id; | ||
| if (term_is_local_pid_or_port(recipient_term)) { |
There was a problem hiding this comment.
Does this test still make sense?
The alias work renamed REF_SIZE to TERM_BOXED_REFERENCE_SHORT_SIZE and left REF_SIZE as a deprecation shim that emits a GCC diagnostic on every use. The libAtomVM tree was fully migrated, but the three USB CDC drivers were not: ten uses remained, so every esp32, stm32 and rp2 build gained deprecation warnings, turning into hard errors when AVM_WARNINGS_ARE_ERRORS is enabled (the drivers inherit libAtomVM's compile options). The generic_unix builds never compile these drivers, which is how the leftovers survived every native test pass. All ten uses size reply allocations whose reference is built with term_from_ref_ticks, i.e. a plain short reference, so TERM_BOXED_REFERENCE_SHORT_SIZE is the exact replacement. The shim expands to that same macro, so the generated code is unchanged and the deprecation is left with zero in-tree users. Flagged by the pre-merge review, which confirmed with a standalone probe that the pragma fires in exactly these expression contexts and becomes a hard error under -Werror. Signed-off-by: Davide Bettio <davide@uninstall.it>
The differences-with-beam page documents core divergences from the BEAM, and the process alias section was not one: its remaining substance is a distribution limitation. Move it to a new Known Issues & Limitations section of the distributed Erlang page, which now carries the only user-facing note: an outbound send to a remote alias is dropped, while inbound delivery from a BEAM peer works. The page-closing invitation to file issues or pull requests moves below the new section, so it keeps closing the page and now reads on both the partial feature list and the known issues. Dropped in the move: the select-handle rejection (an exotic corner that is not worth user-facing documentation -- it remains commented at both validation sites and covered by tests) and the OTP >= 26 semantics claim. The differences-with-beam page is back to its release-0.7 content. Signed-off-by: Davide Bettio <davide@uninstall.it>
Five tests closing the review's coverage gaps (AR-10): - test_alias_send_after_owner_died: a send through the alias of a dead process is dropped, still returns the message, and surfaces nowhere; the spawn churn probes would report a misdelivery. AtomVM assigns process ids monotonically so the dead owner's id is not reused, and globally-unique ref ticks are the invariant that keeps a stale alias unmatchable even by an id reuser. - test_alias_multi_sender_unalias: four senders hammer one alias while the owner unaliases mid-stream. All synchronization is per-sender send order and explicit acks -- no sleeps and no timing windows -- so the test cannot flake on slow or loaded hosts. Phase 1 pins the exact delivered count behind per-sender fences while the alias is active; phase 2 races unalias/1 against the senders and pins no crash, a bounded count and an observably dead alias afterwards. - test_alias_duplicate_options: alias/1 duplicate options are last-wins in both orders, like OTP 29 (the monitor/3 duplicate was already covered). - test_unalias_non_reference_badarg: unalias(42) raises badarg, like OTP. - test_binary_to_term_invalid_process_ref: a wire-format alias with its owner pid word patched to 0 or above the 28-bit maximum must fail to decode -- the untrusted-input validation this branch added to the decoder, previously untested. The BEAM does not read reference words as a pid and decodes plain references there; the test asserts that instead. All five pass on both flavors and on the BEAM (OTP 29) via -b; the module was looped 30x on each flavor (0 failures) and runs under valgrind with 0 errors. Signed-off-by: Davide Bettio <davide@uninstall.it>
Process ids are int32_t throughout (Context.process_id, RefData.process_id, and every assignment of this accessor's result), but term_process_ref_to_process_id returned uint32_t. Return int32_t, making the six int32_t call sites exact; the remaining sites store the always-positive, 28-bit-bounded value into uint32_t wire words, where the conversion is well-defined and unchanged. Also from the review's style pass: drop the stray blank line in the REFERENCE_PROCESS_PID_OFFSET conditional (the neighboring blocks are compact), fix "an unique" in the term_make_process_reference doc, and state the load-bearing layout invariant there: the ticks occupy the same words as in a short reference, so term_to_ref_ticks works on both shapes. The remaining nits from the same review pass were deliberately not taken: explicit casts in term_make_process_reference would decorate conversions that are well-defined and always in range; initializing the parse_monitor_opts alias_type out-param adds a line for a latent-only risk; the context_dump %lu ticks truncation matches the surrounding pre-existing lines and is best fixed together with them. Signed-off-by: Davide Bettio <davide@uninstall.it>
A message sent to an inactive alias is rejected at the owner's mailbox
drain, but both drop sites disposed it with mailbox_message_dispose, which
appends the signal storage to the receiver's heap as a fragment: the dead
{Ref, Message} tree -- including any refc binaries it pins -- stayed on the
victim's heap until its next GC. OTP drops such messages with no receiver
heap impact, so a flood of sends to a deactivated alias produced
AtomVM-only GC pressure, with the receiver paying for messages it never
accepted.
The deferred reclamation is only needed for delivered messages, whose terms
may still be referenced after a receive. A dropped alias message was never
delivered, so nothing can reference its term: free it immediately with
mailbox_message_dispose_unsent (which sweeps the mso_list, releasing refc
binaries), exactly the treatment unsent messages already get. TermSignal
and Message share the layout already asserted for the in-place conversion,
and both drop sites run in the owner's own context (from_task = false).
The reply-mode and duplicate-option tests now also push a >64-byte binary
through each drop path, so the refc sweep is exercised under valgrind and
ASAN.
Flagged by the pre-merge review (round 2, issue 4).
Signed-off-by: Davide Bettio <davide@uninstall.it>
The reply_demonitor branch of context_process_alias_message_signal takes the processes_table read lock from inside the mailbox drain, which is why callers of the with-aliases drain must not hold that lock and why context_destroy stays on the alias-blind variant. The constraint was documented at the mailbox.h API; nothing at the acquisition site itself said so, and the pre-merge review (round 2, issue 5) asked for a guard against a future caller draining with the lock held. A guard already exists, stronger than the suggested debug assertion: on generic_unix smp_rwlock_rdlock aborts on any pthread error, and glibc returns EDEADLK when the calling thread already owns the write lock (probe-verified), so the misuse dies fast at the exact lock call in every build type instead of deadlocking silently. A literal current-thread ownership assertion would need thread-local tracking maintained at all 22 processes_table lock sites across the four platform smp implementations -- out of proportion for a fragility note. State the invariant and its enforcement in a comment at the acquisition site, so the coupling is visible where a future reader would break it. Signed-off-by: Davide Bettio <davide@uninstall.it>
The owner in test_monitor_alias_down_before_send_same_batch busy-waits on whereis by design: receiving would drain its mailbox and defeat the same-batch construction. Under valgrind all threads are serialized with an unfair scheduler, so the spinning owner can starve the relay; the 5M-iteration bound expired once in a full-suite valgrind run (the test passes standalone, on the other flavor, and on rerun, with zero memory errors every time). Raise the bound to 50M -- ten times the scheduling opportunities for the relay -- and say why it is large. The fast path is unchanged: the loop exits as soon as the relay registers. Signed-off-by: Davide Bettio <davide@uninstall.it>
|
AMP: PR Review — Process aliases for AtomVMRange reviewed: Diffstat: ~40 files, +1975 / −173. The series adds OTP process aliases: VerdictStrong, ship-quality work. The design is sound: aliases are validated in The findings below are mostly pre-existing OOM-robustness gaps that this Findings1. (Medium) OOM in
|
This revamps PR #2027
These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).
SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later