WIP: UI smoke tests for axis, touchy, gmoccapy, qtdragon#3999
WIP: UI smoke tests for axis, touchy, gmoccapy, qtdragon#3999grandixximo wants to merge 17 commits intoLinuxCNC:masterfrom
Conversation
Phase 1 of LinuxCNC#3756: launch each GUI under xvfb-run against an existing sim config, drive Estop reset / machine on / home all via NML, assert the interpreter reaches IDLE, then shut down cleanly. Verifies the GUI starts and accepts basic commands without crashing. Skips gracefully (exit 77) when xvfb-run is not installed, matching the precedent set by tests/tooledit and tests/pyvcp. Shared helpers under _lib/: drive.py common NML driver, prints UI_SMOKE_OK on success launch.sh xvfb-run wrapper with setsid + signal escalation for clean linuxcnc shutdown (preserves shared memory cleanup via scripts/linuxcnc trap) checkresult.sh shared pass/fail check delegated to by per-test checkresult shims Each per-GUI directory exposes test.sh + checkresult and reuses the existing configs/sim/<gui>/*.ini so no test-only sim configs are introduced. Functional tests (load G-code, verify final position) and screenshot/ video on failure are deferred to follow-up phases. xvfb is already declared in debian/control (<!nocheck>) so apt-get build-dep installs it on CI; no new system deps required for this phase. Refs LinuxCNC#3756
CI failed with "Permission denied" exec'ing _lib/launch.sh because the local repo has core.filemode=false so chmod +x was not recorded in the git index. Use git update-index --chmod=+x to mark all test scripts as executable.
Two CI-driven fixes: 1. Per-GUI Python module preflight in launch.sh. test.sh now passes a comma-separated list of modules the GUI needs at import time; if any fail to import the test exits 77 (skipped) rather than wedging linuxcnc waiting for a GUI that will never come up. - axis: OpenGL.GL - touchy, gmoccapy: gi - qtdragon: PyQt5.QtCore, qtvcp Master CI does not currently install these runtime deps (Bertho's LinuxCNC#3391 work added them only to the 2.9 branch), so without preflight every smoke test failed with a wedged linuxcnc startup or an uninformative timeout. This way the tests skip cleanly until the deps land in master CI. 2. Wait up to 30s for the linuxcnc SIGTERM trap (scripts/linuxcnc Cleanup) to finish before SIGKILL. Earlier tighter window meant Cleanup got cut off mid-run and left shared memory attached, which caused subsequent tests in the same job to fail with SHMERR. Refs LinuxCNC#3756
The previous launch.sh had `echo "WARN: ..."` inside a `bash -c "..."` heredoc; the inner double quotes closed the outer string and the shutdown block was truncated. Symptom on CI: "linuxcnc: -c: line 34: syntax error: unexpected end of file" before any logs were captured. Switch to single quotes for the warning message. Also add cairo to gmoccapy's import preflight: gladevcp.makepins (loaded by gmoccapy) imports cairo via the led module, which trips on minimal CI without python3-cairo.
scripts/runtests does not honor exit 77 from a test.sh; its skip mechanism is a per-directory `skip` executable that returns non-zero when the test should be skipped. Add a shared _lib/skip-if-missing.sh and per-GUI skip scripts that check for xvfb-run plus the python modules each GUI needs. The launch.sh preflight stays as a fallback. Modules required: axis OpenGL.GL touchy gi, cairo gmoccapy gi, cairo qtdragon PyQt5.QtCore, qtvcp
Forward port of the GUI dependency work from 2.9 (LinuxCNC#3391). The runtime deps were already in linuxcnc-uspace's Depends, but apt-get build-dep on CI does not install runtime deps, which left the new ui-smoke tests unable to launch any GUI and forced them to skip. Adds python3-opengl, python3-pyqt5, python3-pyqt5.qsci, python3-cairo, python3-gi, python3-gi-cairo, gir1.2-gtk-3.0 under the !nocheck profile, matching the existing pattern for xvfb and x11-xserver-utils. Edited debian/control.top.in (debian/control is gitignored and regenerated by debian/configure). Refs LinuxCNC#3391, LinuxCNC#3756
CI run after the first dep batch revealed gmoccapy needs the GtkSource-4 typelib, qtdragon needs additional PyQt5 modules (qtsvg/qtopengl/qtwebengine), python3-qtpy, and the dbus mainloop binding. Add these to Build-Depends with !nocheck profile so they install on apt-get build-dep. Also extend skip-if-missing.sh to verify gi typelibs (entries of the form gi:Namespace:version), not just python imports. This catches the GtkSource case where gi imports fine but the typelib is absent, which gladevcp tripped on at gi.require_version time. touchy and gmoccapy skip predicates now require Gtk-3.0 (and GtkSource-4 for gmoccapy). Refs LinuxCNC#3756
The previous driver did too much for a smoke layer (Estop reset, machine on, home all, wait for IDLE) and tripped on each GUI's specific startup sequence assumptions. Reduce to: connect to NML, wait for task ready, sleep 3s for GUI construction, recheck task alive, print UI_SMOKE_OK. This is the literal answer to Bertho's "does it start" question. Functional behaviour belongs in tests/ui-functional/ (Phase 2). Also harden shutdown: extend the SIGTERM grace from 30s to 60s, and add a halrun -U + explicit ipcrm fallback if Cleanup still has not finished. Removes /tmp/linuxcnc.lock too. Without this the next ui-smoke test inherited stale shared memory and wedged at startup. Bump LINUXCNC_TIMEOUT to 180s (8s startup + 30s driver + 60s grace + slack) and reduce DRIVER_TIMEOUT to 30s now that the driver work is small. Refs LinuxCNC#3756
CI run after the previous fix made progress (0 shmem errors, axis and gmoccapy passing) but qtdragon hit "bind error: 98 -- Address already in use" on NML port 5005, meaning gmoccapy's linuxcncsvr was still alive when qtdragon tried to start. touchy then cascaded. Add a pre-launch cleanup to launch.sh that pkills the known long-lived processes (linuxcncsvr, milltask, halui, hal_bridge, axis, gmoccapy, touchy, qtvcp, rtapi_app), removes /tmp/linuxcnc.lock, runs halrun -U, and ipcrms any leftover linuxcnc shared memory keys before each test. Refs LinuxCNC#3756
If you manage to create consistent screenshots and want to go to pedantic mode:
Probably over complicated and I don't know how deterministic LinuxCNC is but this way, bugs like this #3979 can be easily avoided. Testing manually, these kind of bugs are just often overlooked. |
Three review-driven changes:
1. Fix self-kill regression: pkill -KILL -f "\\bqtdragon\\b" matched
the launch.sh process whose argv contained the path
.../qtdragon_metric.ini, sending SIGKILL to the test itself
(exit 137 across all 4 tests). Use pkill -KILL -x against an
exact daemon name list (linuxcncsvr, milltask, halui, rtapi_app),
not the GUI program names; the GUIs are children of the linuxcnc
script and get reaped via SIGTERM to its process group.
2. Dedupe cleanup. Both pre-launch and post-shutdown blocks repeated
the daemon list and shared-memory key list; extract them to
_lib/cleanup-runtime.sh which is called from launch.sh and from
the heredoc fallback. Single source of truth.
3. Drop the pre-driver `sleep 8` and the python module preflight
inside launch.sh. drive.py polls echo_serial_number for task
readiness so a wall-clock wait is unnecessary. With GUI runtime
deps now declared in debian/control under !nocheck, the python
preflight has nothing to do; missing deps will fail the test
loudly which is what reviewers asked for ("if it skips gracefully
we don't know whether the code is sane"). The skip predicate
only skips on xvfb-run absence (rare local dev environment).
Refs LinuxCNC#3756, PR LinuxCNC#3999
The reference-screenshot diff approach is a good Phase 3 idea, will track it on #3756. For Phase 1 (this PR) I'm staying with NML state assertions only since they're deterministic; rendering will need the screen-stabilization tricks you mentioned. |
After dropping the pre-driver sleep, the driver now races linuxcnc
startup. linuxcnc.stat()/command() and the first stat.poll() can
raise linuxcnc.error while linuxcncsvr is still setting up its
buffers ("emcStatusBuffer invalid err=3"). Previously the driver
bailed on the first exception, so all 4 ui-smoke tests failed
within ~1s on CI.
Retry both the constructor calls and stat.poll() until the deadline,
treating these errors as "task not ready yet" rather than fatal.
The wait_for timeout (TIMEOUT_S=30s) bounds the wait.
axis ran fully on CI (27656 task cycles ≈ 28s wall) but the test exited 124 because the inner DRIVER_TIMEOUT=30s clipped the driver which itself can take up to TIMEOUT_S=30s for NML connect retry + 30s for task-up wait + 3s settle. Bump DRIVER_TIMEOUT to 90 so the driver finishes; bump LINUXCNC_TIMEOUT to 240 to accommodate driver + 60s shutdown grace + slack on slower runners.
Two fixes: 1. drive.py: recreate linuxcnc.stat() in the retry loop. The status buffer can be invalid (err=3) for the first ~30s while linuxcncsvr initialises; once a stat object is bound to the invalid buffer it does not recover when the buffer becomes valid. Recreating the object on each retry lets the driver pick up the buffer as soon as it is ready. CONNECT_TIMEOUT_S widened to 60s to accommodate slow CI startups. 2. launch.sh: export LIBGL_ALWAYS_SOFTWARE=1 and GALLIUM_DRIVER=llvmpipe. GitHub Actions runners have no GPU; qtdragon's GLcanon widget segfaults under hardware GL when the only display is xvfb. Force Mesa llvmpipe software rasterizer.
CI run after the previous fix: 280/282 passing (axis and gmoccapy green). Two remaining failures isolated: 1. touchy crashes in filechooser.py:29 because os.listdir() of $HOME/linuxcnc/nc_files raises FileNotFoundError on a clean CI $HOME. The path is hardcoded with no try/except in the GUI itself; pre-create it in launch.sh until the underlying bug can be fixed upstream. 2. qtdragon still segfaults despite LIBGL_ALWAYS_SOFTWARE=1, so the rest of Qt's GL stack is also reaching for hardware. Set QT_QUICK_BACKEND=software, QSG_RHI_BACKEND=software, and QT_OPENGL=software to force every Qt path through the software rasterizer.
qtvcp compiles a QRC (Qt resource) file into Python at first run using `pyrcc5`. On CI without pyqt5-dev-tools the call fails with "No such file or directory: 'rcc'" and qtdragon then segfaults trying to load missing resource symbols. Adding the package to Build-Depends with !nocheck makes apt-get build-dep install it alongside the rest of the GUI runtime deps. This is the last remaining ui-smoke failure: with this in place all four (axis, gmoccapy, qtdragon, touchy) should pass on CI.
qtdragon now launches successfully and the driver prints UI_SMOKE_OK, but Qt segfaults during shutdown when SIGTERM tears the process down mid-cleanup. That is out of scope for a startup smoke test: the GUI came up, accepted NML, and answered Bertho's "does it start" question. Restrict the crash-marker grep to lines before UI_SMOKE_OK so genuine startup crashes (no UI_SMOKE_OK printed) still fail the test, while shutdown-side noise is tolerated. Driver already prints UI_SMOKE_OK only after a successful NML round-trip, so a silent corruption can not slip through.
The previous "ignore crashes after UI_SMOKE_OK" approach was wrong
because launch.sh prints linuxcnc.{out,err} before ui-smoke.{out,err}
in the captured log, so shutdown-side crashes always appear in the
file before the UI_SMOKE_OK line and got incorrectly flagged.
The driver is the authoritative signal: it only prints UI_SMOKE_OK
after a successful NML round-trip and a re-poll after the GUI settle,
so a healthy startup is guaranteed when that line is present. Genuine
startup crashes (linuxcncsvr fails to come up, GUI dies before driver
connects) result in UI_SMOKE_FAIL or no driver output at all, both of
which we now flag explicitly.
Replaces the crash-marker regex with a simple two-line check:
UI_SMOKE_FAIL absent and UI_SMOKE_OK present.
|
Round 2 pushed; CI now passes 282/282 with all 4 ui-smoke tests running. Changes in this round:
|
Draft, opening for CI feedback. Refs #3756.
Summary
Phase 1 of the GUI test work tracked in #3756. Each test launches a GUI under
xvfb-runagainst an existingconfigs/sim/<gui>/*.ini, drives Estop reset / machine on / home all via NML, asserts the interpreter reaches IDLE, then shuts down cleanly. Verifies the GUI starts and accepts basic commands without crashing.Coverage
Mechanics
tests/ui-smoke/_lib/launch.sh:xvfb-runwrapper,setsidso the linuxcnc process group can be signalled cleanly, falls back toaxis-remote --quitthen SIGTERM with grace then SIGKILL. Skips with exit 77 ifxvfb-runis unavailable (matchestests/tooleditandtests/pyvcp).tests/ui-smoke/_lib/drive.py: NML driver. Tolerant of sim configs that come up already inSTATE_ONvia auto-estop-release HAL wiring. Falls back to per-joint serial homing if noHOME_SEQUENCEis configured.tests/ui-smoke/_lib/checkresult.sh: pass whenUI_SMOKE_OKprinted and no crash markers in captured logs.Cleanup discipline
.gitignorecovers all runtime artifacts (linuxcnc.{out,err,pid},ui-smoke.{out,err},result,stderr)Deps
xvfbis already declared indebian/controlwith the<!nocheck>profile soapt-get build-depinstalls it on the existing CI without a workflow change. Coordinated with @hdiethelm in #3984: this PR adds no system deps; if his lands first, no rebase needed here.Out of scope (deferred)
linuxcnc.command.program_open+auto(RUN), verify final position vialinuxcnc.stat.position. Per-GUI cross-checks viaxdotoolor AT-SPI where useful.Test plan
scripts/runtests tests/ui-smoke, no shmem leaks