Conversation
Fixes pypa#1998 Currently, linkcheck is blocking PRs if any link in the project isn't accessible. Due to the high number of links, this regularly blocks unrelated PRs (see pypa#2018 for a recent example). This PR moves linkcheck to a separate, non-blocking check. It will still run on PRs and report a failure if a newly added link 404s (pypa#1998 (comment)), but an unrelated failure won't block PRs anymore. We also run the check on a daily cron, so that links going away is surfaced separately from PRs.
5692294 to
3bfdecb
Compare
|
I'd rather block PRs on newly added broken links. How can we detect this? Maybe store the Sphinx builder's structured output as an artifact and exclude those links that existed prior to PRs? |
|
I'm not aware of any tool or function that would support only blocking on new URLs, but we can add it separately. This PR is about not blocking working PRs on a broken linkcheck, including things such as getting changes from accepted PEPs linkcheck. |
|
My 0.02c is that we can rely on maintainer discretion here -- maintainers should still review the linkcheck job when it fails (especially when a PR has new links being added), but can override it when it fails for transient reasons (like these recent kinds of cert issues). |
Yes. But my concern is the additional maintenance burden. When a job isn't marked as required, the contributors may be tempted to take the easy way out and rely on the maintainer to tell them that it actually should pass as opposed as the UI showing them that they need to look into it, which may happen w/o the maintainer's involvement. |
So linkcheck produces JSONL like this: https://github.com/ansible/awx-plugins/actions/runs/24041509767#summary-70114297278. If we have that pre-downloaded into the job from artifacts/cache, we could have Then, |
|
As one of the other other reviewers, I don't think pushing the burden of mitigating the risk of invalid links back to human reviewers is a good idea. Another potential source of inspiration for ways to keep link rot from blocking unrelated PRs would be diff cover. |
|
The linkcheck still runs, and it is still reported. What this PR changes is it that it stops blocking unrelated PRs in a way that can't be mitigated, as the check becomes optional instead of mandatory. The current situation is that all PEP authors and spec maintainers are randomly affected by this. If we specifically want to ensure that linked content is still available, we should archive it in the internet archive and provide an archived link next to live link, as e.g. wikipedia does: https://en.wikipedia.org/wiki/Python_(programming_language)#References |
In my previous comment, I offered an idea of exactly how to mitigate this case.. |
|
I can offer this PR as an improvement, because I'm trying to improve specs maintenance and this is a frequent problem, but implementing a new linkcheck logic here is out of scope for me. |
|
I agree with @konstin here -- I think in principle it's good for us to block on linkcheck failures, but in practice it's exactly the kind of maintenance burden we're discussing trying to avoid now. Speaking personally, the stochastic linkcheck failures are a major drag on my willingness to contribute to this repo; they make every PR take longer (as in days-to-weeks instead of hours-to-days) than they should, and they act as a forced "context switch" whenever they pop up (making me go from doing whatever I was working on to figuring out why a specific link is now dead). I think this is especially unfortunate when most of the links in this repo probably aren't being clicked by users; most of the breakage happens with what I suspect are our "coldest" links (old SourceForge pages, etc.) rather than the "hottest" ones (PEPs, GitHub, etc.). TL;DR: I think we should evaluate another automated solution here (@ncoghlan's idea of doing it by diff cover makes a lot of sense to me!), but in the mean time I'm in favor of making linkcheck a soft-only failure on PRs. (If it helps convince anyone else, I'm willing to fix links when they fail on main -- I don't mind doing that at all! It's primarily the context switch in the middle of a PR that's frustrating.) |
|
@woodruffw if you're willing to take in the burden, I guess we could do this. However, can anyone explain me why the linkcheck ignore idea doesn't seem to be considered? I don't understand if I'm missing anything. Additionally, I think that having a workflow copy is the wrong way of implementing this. |
|
Yeah, sorry if I was unclear @webknjaz -- your approach also sounds good to me! I'm in favor of just about anything that would automate this further, my main opinion with just this PR in particular was that we shouldn't block it on a more general solution (since broken links are a current and ongoing maintenance drag). |
Fixes #1998
Currently, linkcheck is blocking PRs if any link in the project isn't accessible. Due to the high number of links, this regularly blocks unrelated PRs (see #2018 for a recent example).
This PR moves linkcheck to a separate, non-blocking check. It will still run on PRs and report a failure if a newly added link 404s (#1998 (comment)), but an unrelated failure won't block PRs anymore. We also run the check on a daily cron, so that links going away is surfaced separately from PRs.
📚 Documentation preview 📚: https://python-packaging-user-guide--2035.org.readthedocs.build/en/2035/