gh-73936: Add email.saslprep (RFC 4013) and use it in smtplib for Unicode credentials#148692
gh-73936: Add email.saslprep (RFC 4013) and use it in smtplib for Unicode credentials#148692blink1073 wants to merge 6 commits intopython:mainfrom
Conversation
…e credentials Co-authored-by: Arnt Gulbrandsen <arnt@gulbrandsen.priv.no> Co-authored-by: Bernie Hackett <bernie.hackett@gmail.com>
|
Kicking to re-run the CLA bot |
|
I am not really fond of having it in hashlib. It has nothing to do with hashing and cryptographic primitives. Unfortunately having a new module requires a PEP usually. I also do not think it is right to change smtplib in the same PR. I am on mobile so hard to check, but what RFC are we guaranteeing to follow for SMTP? |
This is a good question. Supporting all the various RFCs related to email may just be too much for the stdlib, depending on whether we have a dedicated maintainer for it. We split out server-side support into I'm not really involved in any of this any more, so I don't really have a say. Aside: I still think it makes sense to have a standards-compliant email parsing library in the stdlib. |
|
I would rather prefer the following:
|
|
The motivating issue asked to support unicode passwords in smtp. #103611 proposed using I added it to It could be that there is a simpler approach to adding unicode password support using rfc6531. My main intent was to help carry the work from #103611 across the finish line, since I am a maintainer of the |
|
Well, I'm more or less once again the maintainer for smtplib after a long absence, though my primary focus is the email library. Right now I'm mostly focused there, rewriting the header parser to deal with a security-adjacent performance issue, so my head is currently loaded with the email RFCs, not the SMTP RFCs :) I don't think RFC 6531 helps here; at least, a search for 'pass' does not get any hits. I think it is focused on the data, not on the auth mechanisms, but it has been a while since I scanned it, much less read it. The issue you picked up from mentions saslprep being required by the RFC. It would be interesting to have a direct reference for that. I think my comments on the original issue are relevant here: if we change smtplib so that you can at least pass binary passwords, then the user can at least implement what they personally need, even if smtplib doesn't directly support saslprep. I think, since it is a standard and would directly enhance smtplib, imaplib, and poplib (if I understand correctly), supporting saslprep in the stdlib would be nice, and the contribution of the code is great, but we do probably want some sort of maintenance commitment as well? And then there is the question of where to put it, since it would be shared by all the email client code. So unfortunately a PEP might be needed; I'm not entirely clear on current procedures, hopefully Barry can speak to that. I think there is other auth code shared between those modules (or that could be shared) that could go in a common location, but I'd have to refresh my memory on those modules before I could say for sure. So I see two orthogonal tasks here, and we could decide we should do either or both: we can get the possibility of non-ascii passwords working by allowing bytes passwords, and we can implement RFC compliant support for non-ascii unicode passwords, which is the much bigger job that this PR is focused on. I don't think there's currently any PR for the first one :) Oh, another thought on location for saslprep: in an ideal world we might reorganize the stdlib so that poplib, imaplib, and smtplib were all under the 'email' package. In that case saslprep would also go there. So maybe we just wink and put it there? Maybe we could avoid having to do a PEP that way? Or should we do one anyway? |
|
I am not sure we can change the layout without a PEP, we needed one for the new compression package. However we can do the same (would be glad to write that PEP). It would make more sense to have also another PEP for saslprep because 1) it is a new module 2) it is a nontrivial addition Personally I do not want it to be included in hashlib. It is NOT a cryptographic primitive (I am already skeptical about having file_digest there but and would rather see it in filecmp). I am ok with allowing bytes password as it would make it much more extensible in some sense and leave input preparation to external librairies. Concerning the auth mechanism I did some work in the past but I do not remember which parts were shared. I remember that there were different designs though which I was a bit surprised about so I am not entirely sure that we can refactor all mechanisms (again, IMO, we should striclty follow the RFCs here but avoid supporting all extensions; rather we should possibly improve the design so that third-party librairies can provide those extensions). |
|
I was thinking more in terms of a reference in another RFC that says
that to handle unicode passwords you have to use SASL. But that
may be off base, because it's been a long time since I read the
relevant portions of the SMTP RFCs. I'll have to rectify that, but
it may be a bit before I get to it.
|
The best chain of evidence I can find is: In the source it states "This should follow...RFC 2554 (SMTP Authentication)". RFC 2554 calls out the use of SASL, which was defined in RFC 2222 (1999). An updated RFC for SASL is RFC 4422 (2006). RFC 4422 in turn states: "For simple user names and/or passwords in authentication credentials, SASLprep [RFC4013] (a profile of the StringPrep [RFC3454] preparation algorithm), SHOULD be specified as the preparation algorithm. |
|
Yeah, that's what I was looking for. Base SMTP (rfc 5321) has no auth at all, so if we have auth, that says it has to be SASL. The update RFC for that one (https://datatracker.ietf.org/doc/html/rfc4954) goes further and says "adds a requirement to support SASLprep profile for preparing authorization identities" So while I haven't read anything in depth at this point, on the face of it it appears that in order to offer auth and be RFC compliant we have to have saslprep.
I wasn't talking about changing any layout (that's a pipe dream :) Just of having 'email.saslprep'. The more I think about it the more sense it makes: the email library provides various email services that other libraries (including http!) use, and this would be one more such service. Granted, it is orthogonal to the other components of the email package (shares no code with them), so it's a bit of a stretch, but not, I think, an unreasonable one given historical constraints. This is just a proposal, though. If the people here like the idea, is there another forum in which it should be further discussed first? If we have to write a PEP either way, then it will get discussed there ;) |
Given the options, that makes sense to me.
I think this probably doesn't require a PEP. I'd suggest a topic on discuss.python.org under the |
|
Thank you both! I'll update the PR to use |
…e credentials Move the SASLprep implementation to email._saslprep and expose it as email.saslprep. Apply it in smtplib.auth_plain() (recommended by RFC 4616), with opt-in support for auth_login() and auth_cram_md5() via a new saslprep= keyword argument on SMTP.login().
…hon into unicode-passwords-smtplib
|
Please, do not make the changes to smtplib in this PR. I would prefer a folloe-up PR. We should consolidate the email.saslprep module first. For now, I would even make it a private module. I am however unsure that this should land in 3.15. I would rather see it in 3.16 because feature freeze is in 2 weeks. |
Update the module docstring reference from RFC 2554 to RFC 4954 (which obsoletes it), add a credits line for the RFC 4954 authentication support, and move _apply_saslprep() to sit alongside the other module-level helpers.
|
Fair points, I think targeting 3.16 and landing all of the changes together makes sense. |
|
I'll pause work for now. |
This is a continuation of #103611, updated to address all open reviewer comments.
Changes
Lib/email/_saslprep.py(new): RFC 4013 SASLprep implementation,adapted from the PyMongo project
with permission. Apache 2.0 licence header retained; MongoDB corporate CLA is
signed.
Lib/email/__init__.py: exposessaslprepasemail.saslprep(public API).Lib/smtplib.py: appliesemail.saslprep()inauth_plain()(recommendedby RFC 4616) with optional support in
auth_login()andauth_cram_md5()(neitherRFC recommends it); adds a
saslprep=keyword argument tologin()—Noneforper-mechanism defaults,
Trueto force normalization on all mechanisms,Falsetoopt out entirely; switches
auth()encoding from'ascii'to'utf-8'.Lib/test/test_saslprep.py(new): tests for RFC 4013 examples, charactermapping, prohibited characters, bidirectional checks, unassigned code points, and
test cases from the MongoDB JS saslprep library.
Lib/test/test_smtplib.py: adds Unicode credentials to the simulated server(Devanagari username/password; a password that SASLprep normalises via NFKC);
exercises all three auth mechanisms; adds tests for the
saslprep=parameter(
True/False/invalid type) and verifies_saslprepis reset after bothsuccessful and failed logins.
Doc/library/email.saslprep.rst(new): documentsemail.saslprep(),including the four-step algorithm, parameter semantics, and doctests.
Doc/library/email.rst: links the newemail.saslpreppage into the packagetable of contents.
Follow up work
email.saslprepcould be applied to other stdlib modules that handleSASL-based authentication:
imaplib,poplib,ftplib, andnntplib.📚 Documentation preview 📚: https://cpython-previews--148692.org.readthedocs.build/