Skip to content

Don't block PRs on linkcheck#2035

Open
konstin wants to merge 4 commits intopypa:mainfrom
konstin:konsti/linkcheck-on-main
Open

Don't block PRs on linkcheck#2035
konstin wants to merge 4 commits intopypa:mainfrom
konstin:konsti/linkcheck-on-main

Conversation

@konstin
Copy link
Copy Markdown
Contributor

@konstin konstin commented Apr 20, 2026

Fixes #1998

Currently, linkcheck is blocking PRs if any link in the project isn't accessible. Due to the high number of links, this regularly blocks unrelated PRs (see #2018 for a recent example).

This PR moves linkcheck to a separate, non-blocking check. It will still run on PRs and report a failure if a newly added link 404s (#1998 (comment)), but an unrelated failure won't block PRs anymore. We also run the check on a daily cron, so that links going away is surfaced separately from PRs.


📚 Documentation preview 📚: https://python-packaging-user-guide--2035.org.readthedocs.build/en/2035/

Fixes pypa#1998

Currently, linkcheck is blocking PRs if any link in the project isn't accessible. Due to the high number of links, this regularly blocks unrelated PRs (see pypa#2018 for a recent example).

This PR moves linkcheck to a separate, non-blocking check. It will still run on PRs and report a failure if a newly added link 404s (pypa#1998 (comment)), but an unrelated failure won't block PRs anymore. We also run the check on a daily cron, so that links going away is surfaced separately from PRs.
@konstin konstin force-pushed the konsti/linkcheck-on-main branch from 5692294 to 3bfdecb Compare April 20, 2026 19:41
Comment thread .github/workflows/linkcheck.yml
@webknjaz
Copy link
Copy Markdown
Member

I'd rather block PRs on newly added broken links. How can we detect this? Maybe store the Sphinx builder's structured output as an artifact and exclude those links that existed prior to PRs?

@konstin
Copy link
Copy Markdown
Contributor Author

konstin commented Apr 21, 2026

I'm not aware of any tool or function that would support only blocking on new URLs, but we can add it separately. This PR is about not blocking working PRs on a broken linkcheck, including things such as getting changes from accepted PEPs linkcheck.

@woodruffw
Copy link
Copy Markdown
Member

My 0.02c is that we can rely on maintainer discretion here -- maintainers should still review the linkcheck job when it fails (especially when a PR has new links being added), but can override it when it fails for transient reasons (like these recent kinds of cert issues).

@webknjaz
Copy link
Copy Markdown
Member

My 0.02c is that we can rely on maintainer discretion here -- maintainers should still review the linkcheck job when it fails (especially when a PR has new links being added), but can override it when it fails for transient reasons (like these recent kinds of cert issues).

Yes. But my concern is the additional maintenance burden. When a job isn't marked as required, the contributors may be tempted to take the easy way out and rely on the maintainer to tell them that it actually should pass as opposed as the UI showing them that they need to look into it, which may happen w/o the maintainer's involvement.

@webknjaz
Copy link
Copy Markdown
Member

I'm not aware of any tool or function that would support only blocking on new URLs, but we can add it separately. This PR is about not blocking working PRs on a broken linkcheck, including things such as getting changes from accepted PEPs linkcheck.

So linkcheck produces JSONL like this: https://github.com/ansible/awx-plugins/actions/runs/24041509767#summary-70114297278. If we have that pre-downloaded into the job from artifacts/cache, we could have docs/conf.py read it and extract all the known URLs.

Then, linkcheck_ignore could be populated with all those known URLs either right on the module level or from within the setup() function, but only in the context of PRs (and merge queues). This way, the PR runs would exclude checks against everything that used to be known prior to the PR merge, and new things would still be checked.

@ncoghlan
Copy link
Copy Markdown
Member

As one of the other other reviewers, I don't think pushing the burden of mitigating the risk of invalid links back to human reviewers is a good idea.

Another potential source of inspiration for ways to keep link rot from blocking unrelated PRs would be diff cover.

@konstin
Copy link
Copy Markdown
Contributor Author

konstin commented Apr 27, 2026

The linkcheck still runs, and it is still reported. What this PR changes is it that it stops blocking unrelated PRs in a way that can't be mitigated, as the check becomes optional instead of mandatory. The current situation is that all PEP authors and spec maintainers are randomly affected by this.

If we specifically want to ensure that linked content is still available, we should archive it in the internet archive and provide an archived link next to live link, as e.g. wikipedia does: https://en.wikipedia.org/wiki/Python_(programming_language)#References

@webknjaz
Copy link
Copy Markdown
Member

in a way that can't be mitigated, as the check becomes optional instead of mandatory.

In my previous comment, I offered an idea of exactly how to mitigate this case..

@konstin
Copy link
Copy Markdown
Contributor Author

konstin commented Apr 27, 2026

I can offer this PR as an improvement, because I'm trying to improve specs maintenance and this is a frequent problem, but implementing a new linkcheck logic here is out of scope for me.

@woodruffw
Copy link
Copy Markdown
Member

I agree with @konstin here -- I think in principle it's good for us to block on linkcheck failures, but in practice it's exactly the kind of maintenance burden we're discussing trying to avoid now.

Speaking personally, the stochastic linkcheck failures are a major drag on my willingness to contribute to this repo; they make every PR take longer (as in days-to-weeks instead of hours-to-days) than they should, and they act as a forced "context switch" whenever they pop up (making me go from doing whatever I was working on to figuring out why a specific link is now dead).

I think this is especially unfortunate when most of the links in this repo probably aren't being clicked by users; most of the breakage happens with what I suspect are our "coldest" links (old SourceForge pages, etc.) rather than the "hottest" ones (PEPs, GitHub, etc.).

TL;DR: I think we should evaluate another automated solution here (@ncoghlan's idea of doing it by diff cover makes a lot of sense to me!), but in the mean time I'm in favor of making linkcheck a soft-only failure on PRs.

(If it helps convince anyone else, I'm willing to fix links when they fail on main -- I don't mind doing that at all! It's primarily the context switch in the middle of a PR that's frustrating.)

@webknjaz
Copy link
Copy Markdown
Member

@woodruffw if you're willing to take in the burden, I guess we could do this. However, can anyone explain me why the linkcheck ignore idea doesn't seem to be considered? I don't understand if I'm missing anything.

Additionally, I think that having a workflow copy is the wrong way of implementing this. alls-greens official recommendation is to use a dynamically computed continue-on-error in the job matrix. Could you change the implementation to do this instead of introducing another workflow with hardcoded steps — this is something I've been wanting to avoid.

Copy link
Copy Markdown
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's unblock merging PRs, and also file an issue to improve this to do the smarter thing that @webknjaz suggests. :)

@woodruffw
Copy link
Copy Markdown
Member

Yeah, sorry if I was unclear @webknjaz -- your approach also sounds good to me! I'm in favor of just about anything that would automate this further, my main opinion with just this PR in particular was that we shouldn't block it on a more general solution (since broken links are a current and ongoing maintenance drag).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Linkchecker blocks merging when certificate on external site is expired

5 participants