gh-83938, gh-122476: Stop incorrectly RFC 2047 encoding non-ASCII email addresses#122540
gh-83938, gh-122476: Stop incorrectly RFC 2047 encoding non-ASCII email addresses#122540medmunds wants to merge 11 commits intopython:mainfrom
Conversation
d1f0bdc to
2e0696c
Compare
|
This is based on #81074 (comment):
Checking An alternative would be to introduce a new Also, I think |
2e0696c to
cbedf5d
Compare
Email generators had been incorrectly flattening non-ASCII email addresses to RFC 2047 encoded-word format, leaving them undeliverable. (RFC 2047 prohibits use of encoded-word in an addr-spec.) This change raises a ValueError when attempting to flatten an EmailMessage with a non-ASCII addr-spec and a policy with utf8=False. (Exception: If the non-ASCII address originated from parsing a message, it will be flattened as originally parsed, without error.) Non-ASCII email addresses are supported when using a policy with utf8=True (such as email.policy.SMTPUTF8) under RFCs 6531 and 6532. Non-ASCII email address domains (but not localparts) can also be used with non-SMTPUTF8 policies by encoding the domain as an IDNA A-label. (The email package does not perform this encoding, because it cannot know whether the caller wants IDNA 2003, IDNA 2008, or some other variant such as UTS python#46.)
cbedf5d to
faa4006
Compare
bitdancer
left a comment
There was a problem hiding this comment.
Your analysis is solid and the fix looks great. We'll need a follow on PR to have smtplib handle the new error, but that should be a trivial PR.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be poked with soft cushions! |
- Incorporate PR review feedback - Improve docs
5d60c1c to
bd6845d
Compare
|
I have made the requested changes; please review again |
|
Thanks for making the requested changes! @bitdancer: please review the changes made to this pull request. |
bitdancer
left a comment
There was a problem hiding this comment.
Sorry it took me so long to get back to this.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
- Incorporate PR feedback - Tailor blurbs to individual issues
e6e8a6c to
43eaea1
Compare
It looks like smtplib's send_message() already detects (some) non-ASCII mailboxes and if necessary, automatically clones a policy with There's a potential enhancement, but it's a (little) bit more involved. send_message()'s international detection doesn't check all address headers—e.g., an IDN reply-to would fool it. (And its |
|
I have made the requested changes; please review again |
|
Thanks for making the requested changes! @bitdancer: please review the changes made to this pull request. |
|
This PR is stale because it has been open for 30 days with no activity. |
This comment was marked as off-topic.
This comment was marked as off-topic.
bitdancer
left a comment
There was a problem hiding this comment.
Well, it turns out that this bug goes a bit deeper than apparent. addr_spec was already supposed to be prevented from being encoded, but the logic around as_ew_allowed is broken: it doesn't "look inside" the token it is encoded to make sure all of it can be encoded. The fix in this PR also has its problems, as proven by the test failures after merge with main: want_encoding can be set when what is wanted is wrapping, rather than needing encoding. Those should probably have been two separate flags...but at this point the folder is a bit of a patchwork mess. (I plan to fix that...someday).
If we fix as_eq_allowed handling I think it will make those test failures go away, as well as simplifying the logic for raising the error, but it won't be certain until we try it.
There is another test failure as well, but I think that's not a bug, but rather a change in behavior that the test needs to adapt to. I haven't proven that yet, though :)
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
This PR fixes gh-83938 and fixes gh-122476, which have the same underlying issue.
Email generators had been incorrectly flattening non-ASCII email addresses to RFC 2047 encoded-word format, leaving them undeliverable. (RFC 2047 prohibits use of encoded-word in an addr-spec.) This change raises a ValueError when attempting to flatten an EmailMessage with a non-ASCII addr-spec and a policy with
utf8=False. (Exception: If the non-ASCII address originated from parsing a message, it will be flattened as originally parsed, without error.)Non-ASCII email addresses are supported when using a policy with
utf8=True(such as email.policy.SMTPUTF8) under RFCs 6531 and 6532.Non-ASCII email address domains (but not localparts) can also be used with non-SMTPUTF8 policies by encoding the domain as an IDNA A-label. (The email package does not perform this encoding, because it cannot know whether the caller wants IDNA 2003, IDNA 2008, or some other variant such as UTS-46.)
📚 Documentation preview 📚: https://cpython-previews--122540.org.readthedocs.build/