Skip to content

gh-73936: Add email.saslprep (RFC 4013) and use it in smtplib for Unicode credentials#148692

Open
blink1073 wants to merge 6 commits intopython:mainfrom
blink1073:unicode-passwords-smtplib
Open

gh-73936: Add email.saslprep (RFC 4013) and use it in smtplib for Unicode credentials#148692
blink1073 wants to merge 6 commits intopython:mainfrom
blink1073:unicode-passwords-smtplib

Conversation

@blink1073
Copy link
Copy Markdown

@blink1073 blink1073 commented Apr 17, 2026

This is a continuation of #103611, updated to address all open reviewer comments.

Changes

  • Lib/email/_saslprep.py (new): RFC 4013 SASLprep implementation,
    adapted from the PyMongo project
    with permission. Apache 2.0 licence header retained; MongoDB corporate CLA is
    signed.
  • Lib/email/__init__.py: exposes saslprep as email.saslprep (public API).
  • Lib/smtplib.py: applies email.saslprep() in auth_plain() (recommended
    by RFC 4616) with optional support in auth_login() and auth_cram_md5() (neither
    RFC recommends it); adds a saslprep= keyword argument to login()None for
    per-mechanism defaults, True to force normalization on all mechanisms, False to
    opt out entirely; switches auth() encoding from 'ascii' to 'utf-8'.
  • Lib/test/test_saslprep.py (new): tests for RFC 4013 examples, character
    mapping, prohibited characters, bidirectional checks, unassigned code points, and
    test cases from the MongoDB JS saslprep library.
  • Lib/test/test_smtplib.py: adds Unicode credentials to the simulated server
    (Devanagari username/password; a password that SASLprep normalises via NFKC);
    exercises all three auth mechanisms; adds tests for the saslprep= parameter
    (True/False/invalid type) and verifies _saslprep is reset after both
    successful and failed logins.
  • Doc/library/email.saslprep.rst (new): documents email.saslprep(),
    including the four-step algorithm, parameter semantics, and doctests.
  • Doc/library/email.rst: links the new email.saslprep page into the package
    table of contents.

Follow up work

email.saslprep could be applied to other stdlib modules that handle
SASL-based authentication: imaplib, poplib, ftplib, and nntplib.


📚 Documentation preview 📚: https://cpython-previews--148692.org.readthedocs.build/

blink1073 and others added 2 commits April 17, 2026 11:02
…e credentials

Co-authored-by: Arnt Gulbrandsen <arnt@gulbrandsen.priv.no>
Co-authored-by: Bernie Hackett <bernie.hackett@gmail.com>
@blink1073 blink1073 requested review from a team, gpshead and picnixz as code owners April 17, 2026 16:05
@python-cla-bot
Copy link
Copy Markdown

python-cla-bot Bot commented Apr 17, 2026

All commit authors signed the Contributor License Agreement.

CLA signed

@blink1073
Copy link
Copy Markdown
Author

Kicking to re-run the CLA bot

@blink1073 blink1073 closed this Apr 17, 2026
@blink1073 blink1073 reopened this Apr 17, 2026
@picnixz
Copy link
Copy Markdown
Member

picnixz commented Apr 19, 2026

I am not really fond of having it in hashlib. It has nothing to do with hashing and cryptographic primitives. Unfortunately having a new module requires a PEP usually. I also do not think it is right to change smtplib in the same PR.

I am on mobile so hard to check, but what RFC are we guaranteeing to follow for SMTP?

@warsaw
Copy link
Copy Markdown
Member

warsaw commented Apr 19, 2026

what RFC are we guaranteeing to follow for SMTP?

This is a good question. Supporting all the various RFCs related to email may just be too much for the stdlib, depending on whether we have a dedicated maintainer for it. We split out server-side support into aiosmtpd in Python 3.12. It might be time for client-side as well, but OTOH, it is handy to have basic client-side SMTP support in the stdlib.

I'm not really involved in any of this any more, so I don't really have a say.

Aside: I still think it makes sense to have a standards-compliant email parsing library in the stdlib.

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Apr 20, 2026

I would rather prefer the following:

  • if the current design does not easily allow to override the current support to allow UTF8, make it so (e.g. by adding an encoding parametrr instead of hardcoding "ascii" everywhere).
  • if extending the current interface is easy this should be left to a 3rd party module.
  • hashlib should stay out of it

@blink1073
Copy link
Copy Markdown
Author

The motivating issue asked to support unicode passwords in smtp. #103611 proposed using saslprep.py from MongoDB to accomplish the task.

I added it to hashlib based on this comment from @gpshead.

It could be that there is a simpler approach to adding unicode password support using rfc6531.

My main intent was to help carry the work from #103611 across the finish line, since I am a maintainer of the saslprep.py code.

@bitdancer
Copy link
Copy Markdown
Member

Well, I'm more or less once again the maintainer for smtplib after a long absence, though my primary focus is the email library. Right now I'm mostly focused there, rewriting the header parser to deal with a security-adjacent performance issue, so my head is currently loaded with the email RFCs, not the SMTP RFCs :)

I don't think RFC 6531 helps here; at least, a search for 'pass' does not get any hits. I think it is focused on the data, not on the auth mechanisms, but it has been a while since I scanned it, much less read it.

The issue you picked up from mentions saslprep being required by the RFC. It would be interesting to have a direct reference for that.

I think my comments on the original issue are relevant here: if we change smtplib so that you can at least pass binary passwords, then the user can at least implement what they personally need, even if smtplib doesn't directly support saslprep.

I think, since it is a standard and would directly enhance smtplib, imaplib, and poplib (if I understand correctly), supporting saslprep in the stdlib would be nice, and the contribution of the code is great, but we do probably want some sort of maintenance commitment as well? And then there is the question of where to put it, since it would be shared by all the email client code. So unfortunately a PEP might be needed; I'm not entirely clear on current procedures, hopefully Barry can speak to that. I think there is other auth code shared between those modules (or that could be shared) that could go in a common location, but I'd have to refresh my memory on those modules before I could say for sure.

So I see two orthogonal tasks here, and we could decide we should do either or both: we can get the possibility of non-ascii passwords working by allowing bytes passwords, and we can implement RFC compliant support for non-ascii unicode passwords, which is the much bigger job that this PR is focused on. I don't think there's currently any PR for the first one :)

Oh, another thought on location for saslprep: in an ideal world we might reorganize the stdlib so that poplib, imaplib, and smtplib were all under the 'email' package. In that case saslprep would also go there. So maybe we just wink and put it there? Maybe we could avoid having to do a PEP that way? Or should we do one anyway?

@picnixz
Copy link
Copy Markdown
Member

picnixz commented Apr 21, 2026

I am not sure we can change the layout without a PEP, we needed one for the new compression package. However we can do the same (would be glad to write that PEP).

It would make more sense to have also another PEP for saslprep because 1) it is a new module 2) it is a nontrivial addition

Personally I do not want it to be included in hashlib. It is NOT a cryptographic primitive (I am already skeptical about having file_digest there but and would rather see it in filecmp).

I am ok with allowing bytes password as it would make it much more extensible in some sense and leave input preparation to external librairies.

Concerning the auth mechanism I did some work in the past but I do not remember which parts were shared. I remember that there were different designs though which I was a bit surprised about so I am not entirely sure that we can refactor all mechanisms (again, IMO, we should striclty follow the RFCs here but avoid supporting all extensions; rather we should possibly improve the design so that third-party librairies can provide those extensions).

@blink1073
Copy link
Copy Markdown
Author

The issue you picked up from mentions saslprep being required by the RFC. It would be interesting to have a direct reference for that.

If you mean RFC 2554, the intro has:

"This extension is a profile of the Simple Authentication and Security Layer [SASL]."

@bitdancer
Copy link
Copy Markdown
Member

bitdancer commented Apr 21, 2026 via email

@blink1073
Copy link
Copy Markdown
Author

what RFC are we guaranteeing to follow for SMTP?

The best chain of evidence I can find is:

In the source it states "This should follow...RFC 2554 (SMTP Authentication)".

RFC 2554 calls out the use of SASL, which was defined in RFC 2222 (1999).

An updated RFC for SASL is RFC 4422 (2006). RFC 4422 in turn states: "For simple user names and/or passwords in authentication credentials, SASLprep [RFC4013] (a profile of the StringPrep [RFC3454] preparation algorithm), SHOULD be specified as the preparation algorithm.

@bitdancer
Copy link
Copy Markdown
Member

Yeah, that's what I was looking for. Base SMTP (rfc 5321) has no auth at all, so if we have auth, that says it has to be SASL.

The update RFC for that one (https://datatracker.ietf.org/doc/html/rfc4954) goes further and says "adds a requirement to support SASLprep profile for preparing authorization identities"

So while I haven't read anything in depth at this point, on the face of it it appears that in order to offer auth and be RFC compliant we have to have saslprep.

I am not sure we can change the layout without a PEP, we needed one for the new compression package. However we can do the same (would be glad to write that PEP).

I wasn't talking about changing any layout (that's a pipe dream :) Just of having 'email.saslprep'. The more I think about it the more sense it makes: the email library provides various email services that other libraries (including http!) use, and this would be one more such service. Granted, it is orthogonal to the other components of the email package (shares no code with them), so it's a bit of a stretch, but not, I think, an unreasonable one given historical constraints. This is just a proposal, though. If the people here like the idea, is there another forum in which it should be further discussed first? If we have to write a PEP either way, then it will get discussed there ;)

@warsaw
Copy link
Copy Markdown
Member

warsaw commented Apr 22, 2026

Just of having 'email.saslprep'.

Given the options, that makes sense to me.

is there another forum in which it should be further discussed first?

I think this probably doesn't require a PEP. I'd suggest a topic on discuss.python.org under the Core Development category.

@blink1073
Copy link
Copy Markdown
Author

Thank you both! I'll update the PR to use email.saslprep and make the discourse post.

@blink1073 blink1073 changed the title gh-73936: Add hashlib.saslprep (RFC 4013) and use it in smtplib for Unicode credentials gh-73936: Add email.saslprep (RFC 4013) and use it in smtplib for Unicode credentials Apr 22, 2026
…e credentials

  Move the SASLprep implementation to email._saslprep and expose it as
  email.saslprep. Apply it in smtplib.auth_plain() (recommended by RFC 4616),
  with opt-in support for auth_login() and auth_cram_md5() via a new saslprep=
  keyword argument on SMTP.login().
@picnixz
Copy link
Copy Markdown
Member

picnixz commented Apr 22, 2026

Please, do not make the changes to smtplib in this PR. I would prefer a folloe-up PR. We should consolidate the email.saslprep module first. For now, I would even make it a private module. I am however unsure that this should land in 3.15. I would rather see it in 3.16 because feature freeze is in 2 weeks.

  Update the module docstring reference from RFC 2554 to RFC 4954 (which
  obsoletes it), add a credits line for the RFC 4954 authentication support,
  and move _apply_saslprep() to sit alongside the other module-level helpers.
@blink1073
Copy link
Copy Markdown
Author

Fair points, I think targeting 3.16 and landing all of the changes together makes sense.

@blink1073
Copy link
Copy Markdown
Author

I'll pause work for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants