Skip to content

gh-118716: Do not enforce ASCII to format address#118717

Closed
cedk wants to merge 2 commits intopython:mainfrom
cedk:email-utils-formataddr-utf-8
Closed

gh-118716: Do not enforce ASCII to format address#118717
cedk wants to merge 2 commits intopython:mainfrom
cedk:email-utils-formataddr-utf-8

Conversation

@cedk
Copy link
Copy Markdown
Contributor

@cedk cedk commented May 7, 2024

RFC 6530 allows UTF-8 strings

@nineteendo
Copy link
Copy Markdown
Contributor

You need to update this test:

def test_unicode_address_raises_error(self):
# issue 1690608. email.utils.formataddr() should be rfc2047 aware.
addr = 'pers\u00f6n@dom.in'
self.assertRaises(UnicodeError, utils.formataddr, (None, addr))
self.assertRaises(UnicodeError, utils.formataddr, ("Name", addr))

@nineteendo
Copy link
Copy Markdown
Contributor

The commit that added this test pre-dates that rfc: 8debacb.

@bitdancer
Copy link
Copy Markdown
Member

IMO the legacy API (which is what you are modifying) should not support utf-8 addresses, since it provides no mechanisms for correctly interoperating with smtplib the way the new API does. Therefore I recommend rejecting this PR. See my comments on the issue for more.

@nineteendo
Copy link
Copy Markdown
Contributor

Could you add a news entry? https://blurb-it.herokuapp.com/add_blurb

Comment thread Lib/email/utils.py
"""
name, address = pair
# The address MUST (per RFC) be ascii, so raise a UnicodeError if it isn't.
address.encode('ascii')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I object to this change. I don't think the legacy API should be modified.

self.assertEqual(local_part.local_part, r'\example\\ example')

def test_get_local_part_unicode_defect(self):
def test_get_local_part_unicode(self):
Copy link
Copy Markdown
Member

@bitdancer bitdancer May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not, IMO, the best solution. But fixing it right is non-trivial. The issue here is that this IS a defect if policy.utf8=False, and that should be maintained. The problem is that currently the header value parser does not have access to the policy. Which was a design mistake.

The reason you want to maintain the old behavior (aside from backward compatibility, which definitely counts) is that the parser is designed to be used in multiple scenarios, and one of those scenarios might be a place were we want to be emulating a non-SMTPUTF8 parser and looking at where a parsed message does not pass muster.

Now, that said, I'm not seeing any good way to fix this right. I think the best compromise might be to leave this defect alone, but to change the Address object so that it specifically ignores this defect (does not raise an error in that case). The smtp library should be raising an error if an attempt is made to actually send using a utf8 address to a server that does not support SMTPUTF8. With this compromise backward compatibility will be maintained, while still allowing utf8 addresses to be specified using Address (which should only be used by library consumers, not the library itself). I believe I am correct that such addresses will not otherwise be rejected by the email library.

This leaves a dangling issue that if we parse a message using utf8=True we technically should not get a defect, but unless and until we teach the header value parser about the policy we can't really fix that.

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 9, 2024

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

…r_spec

RFC 6530 allows UTF-8 strings in the local part.
@cedk cedk force-pushed the email-utils-formataddr-utf-8 branch from ac72c64 to 2bbb200 Compare May 10, 2024 10:39
@cedk
Copy link
Copy Markdown
Contributor Author

cedk commented May 10, 2024

I have made the requested changes; please review again

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented May 10, 2024

Thanks for making the requested changes!

@bitdancer: please review the changes made to this pull request.

@bedevere-app bedevere-app Bot requested a review from bitdancer May 10, 2024 10:44
@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions Bot added the stale Stale PR or inactive for long period of time. label Apr 16, 2026
@bitdancer
Copy link
Copy Markdown
Member

I'm closing this because we are taking a different approach in #81074/#122477.

@bitdancer bitdancer closed this Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting change review stale Stale PR or inactive for long period of time.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants