Security: parse_url() truncates oversized URL fields because the vendored http-parser stores URL offsets/lengths as uint16_t
tl;dr — httptools.parse_url() uses the vendored http-parser URL parser. That parser stores URL field offsets and lengths in uint16_t inside struct http_parser_url.field_data[]. Once a parsed field crosses UINT16_MAX, the stored off/len values wrap, and parse_url() returns truncated bytes sliced from the wrong place. I reproduced this directly against httptools.parse_url() and also end-to-end through uvicorn's httptools protocol, where the truncated result propagates into the ASGI scope.
This was previously reported in httptools as issue #43, which was closed on 2020-02-06 with the maintainer comment:
This is an upstream issue: nodejs/http-parser#481
That upstream repository is now archived and read-only, so the original escalation path is no longer actionable.
Reproduction
1. Direct reproduction with parse_url()
import httptools
print("httptools", httptools.__version__)
for n in [65534, 65535, 65536, 70000, 131071]:
url = b"http://h/" + b"a" * n
r = httptools.parse_url(url)
expected = 1 + n # "/" + n * "a"
actual = len(r.path) if r.path else 0
print(f"body_a={n:<6} parsed.path_len={actual:<6} expected={expected}")
Observed on httptools 0.7.1:
httptools 0.7.1
body_a=65534 parsed.path_len=65535 expected=65535
body_a=65535 parsed.path_len=0 expected=65536
body_a=65536 parsed.path_len=1 expected=65537
body_a=70000 parsed.path_len=4465 expected=70001
body_a=131071 parsed.path_len=0 expected=131072
For oversized values, the observed pattern is:
actual ≡ expected (mod 65536)
I also confirmed the same root cause affects other string fields that are sliced from field_data[], such as query.
2. End-to-end reproduction through uvicorn -> ASGI scope
In a local environment with uvicorn 0.42.0, HttpToolsProtocol.on_headers_complete() still does:
parsed_url = httptools.parse_url(self.url)
raw_path = parsed_url.path
path = raw_path.decode("ascii")
...
self.scope["path"] = full_path
self.scope["raw_path"] = full_raw_path
self.scope["query_string"] = parsed_url.query or b""
This minimal app shows the truncated scope["path"]:
import asyncio, socket, uvicorn
async def app(scope, receive, send):
if scope["type"] != "http":
return
body = f"path_len={len(scope['path'])} head={scope['path'][:32]!r}".encode()
await send({
"type": "http.response.start",
"status": 200,
"headers": [
[b"content-type", b"text/plain"],
[b"content-length", str(len(body)).encode()],
],
})
await send({"type": "http.response.body", "body": body})
async def main():
cfg = uvicorn.Config(app, host="127.0.0.1", port=29571,
log_level="error", http="httptools")
srv = uvicorn.Server(cfg)
task = asyncio.create_task(srv.serve())
await asyncio.sleep(0.6)
for n in [100, 65534, 65535, 70000]:
s = socket.socket()
s.connect(("127.0.0.1", 29571))
s.sendall(
b"GET /" + b"a" * n +
b" HTTP/1.1\r\nHost: x\r\nConnection: close\r\n\r\n"
)
r = b""
while True:
c = s.recv(65536)
if not c:
break
r += c
s.close()
print(r.split(b"\r\n\r\n", 1)[1].strip().decode("utf-8", "replace"))
srv.should_exit = True
await task
asyncio.run(main())
Observed output:
path_len=101 head='/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
path_len=65535 head='/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
path_len=0 head=''
path_len=4465 head='/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
So this is not limited to direct callers of parse_url(): when uvicorn runs with http="httptools", the truncated parse result reaches application code via scope["path"], scope["raw_path"], and scope["query_string"].
Root cause
Struct layout
The Cython declaration mirrors the vendored http-parser struct:
httptools/parser/url_cparser.pxd
struct http_parser_url_field_data:
uint16_t off
uint16_t len
and in vendor/http-parser/http_parser.h:
struct http_parser_url {
uint16_t field_set;
uint16_t port;
struct {
uint16_t off;
uint16_t len;
} field_data[UF_MAX];
};
Slicing in url_parser.pyx
httptools/parser/url_parser.pyx then trusts those values:
off = parsed.field_data[<int>uparser.UF_PATH].off
ln = parsed.field_data[<int>uparser.UF_PATH].len
path = buf_data[off:off+ln]
The Cython code is not the source of the truncation; the wrapped values are already present in the C struct before slicing happens.
Affected string fields sliced from field_data[] include:
schema
host
path
query
fragment
userinfo
(port is also a uint16_t, but it is semantically different and not sliced from field_data[].)
Upstream status
nodejs/http-parser was archived on 2022-11-06 and is now read-only.
nodejs/http-parser#481 ("http_parser_parse_url fails to handle very long URLs") is still OPEN.
nodejs/http-parser#480 ("Fix http_parser_parse_url to handle very long URLs") is still OPEN.
So while the original bug was correctly identified as upstream in 2020, that upstream no longer has a realistic maintenance path.
This repo's own history
PR #56 / merge commit 63b5de2
The commit that switched request parsing from http-parser to llhttp explicitly left URL parsing behind:
Replaces the underlying HTTP parser with llhttp as http-parser is
no longer actively maintained. However we still have http-parser
to power our URL parsing function at the moment.
That merge commit is:
63b5de2bf7c7fa498c7ecac1eb9d8f7352a00dd4
- merged at
2021-03-30T17:05:59Z
URL parser file history on master
From a fresh clone of this repository:
$ git log --follow --oneline -- httptools/parser/url_parser.pyx
63b5de2 Swap http-parse to llhttp (#56)
$ git log --follow --oneline -- httptools/parser/url_cparser.pxd
63b5de2 Swap http-parse to llhttp (#56)
These two files were introduced in 63b5de2 and have not changed on master since then.
Two unmerged replacement branches still present in origin/
Both are visible in git branch -a on the current repository:
origin/rust-url
origin/rust-uri
Their commits are:
| commit |
date |
branch |
author |
message |
04d7df7 |
2021-04-23 |
origin/rust-url |
Fantix King |
Local draft |
294ac41 |
2021-04-25 |
origin/rust-uri |
Fantix King |
WIP: use the http::Uri crate |
Both delete httptools/parser/url_cparser.pxd and httptools/parser/url_parser.pyx and replace them with a Rust-based implementation.
FastAPI / Uvicorn propagation path
This is not limited to applications that explicitly and deliberately choose httptools.
FastAPI's current documentation recommends installing FastAPI with:
pip install "fastapi[standard]"
And FastAPI's docs explicitly state that installing fastapi[standard] also gives you uvicorn[standard].
Uvicorn's own installation docs explicitly state that:
uvicorn[standard] installs httptools
- when
httptools is installed, Uvicorn uses it by default for HTTP/1.1 parsing
So the documented "standard" path is:
fastapi[standard]
-> uvicorn[standard]
-> installs httptools
-> Uvicorn uses httptools by default for HTTP/1.1
This does not mean every FastAPI deployment is affected in practice:
- some users install plain
fastapi
- some switch Uvicorn to
h11
- some use a different ASGI server
- some are protected in practice by frontend URL-length limits
But it does mean this parser is part of the default path for a large set of users following the official FastAPI + Uvicorn "standard" installation flow.
Impact
I am not claiming that every application using httptools or uvicorn is automatically exploitable.
I am claiming that the following primitive exists:
- the application stack may receive a parsed path/query that does not match the actual request-target bytes that were sent on the wire.
That primitive can matter in contexts such as:
- path-based auth or routing decisions
- cache-key derivation
- proxy/backend interpretation mismatches
- audit/log correlation
This is exactly the kind of risk called out in nodejs/http-parser#481, which notes that the parsed and actual path may differ in security-relevant code paths.
Suggested paths forward
Any of these would be better than the current silent truncation:
A. Deprecate parse_url() and document the limitation
Point users to urllib.parse.urlsplit or another maintained parser.
B. Patch the vendored http-parser
Carry a local fix in the vendored copy so that oversized off/len values do not silently wrap.
This is likely a small source change, but it should be evaluated carefully for:
- ABI impact
- packaging/build expectations
use-system-http-parser compatibility
C. Reject overflow explicitly
If changing the struct layout is undesirable, another improvement would be to detect overflow and raise an error instead of returning truncated fields.
D. Resume a replacement implementation
The existing origin/rust-url / origin/rust-uri branches suggest that this migration path was already explored.
Environment used for reproduction
httptools 0.7.1 from PyPI
- current
master checked out at 28d1db15eaeaab5bc7d376d2c2035d966b6e1378 (0.8.0.dev0)
uvicorn 0.42.0 for the end-to-end ASGI reproduction
- Python 3.13 on Windows
The bug is not Windows-specific; the relevant behavior follows from the struct layout and parsing code.
Thanks
httptools has been load-bearing for the Python ASGI ecosystem for a long time. This report is meant to be actionable and specific: there is a concrete reproducible bug, the historical upstream escalation path is closed, and the current repository history suggests this exact corner was already recognized as unfinished when llhttp replaced http-parser for request parsing.
Security:
parse_url()truncates oversized URL fields because the vendoredhttp-parserstores URL offsets/lengths asuint16_ttl;dr —
httptools.parse_url()uses the vendoredhttp-parserURL parser. That parser stores URL field offsets and lengths inuint16_tinsidestruct http_parser_url.field_data[]. Once a parsed field crossesUINT16_MAX, the storedoff/lenvalues wrap, andparse_url()returns truncated bytes sliced from the wrong place. I reproduced this directly againsthttptools.parse_url()and also end-to-end throughuvicorn'shttptoolsprotocol, where the truncated result propagates into the ASGIscope.This was previously reported in
httptoolsas issue#43, which was closed on 2020-02-06 with the maintainer comment:That upstream repository is now archived and read-only, so the original escalation path is no longer actionable.
Reproduction
1. Direct reproduction with
parse_url()Observed on
httptools 0.7.1:For oversized values, the observed pattern is:
I also confirmed the same root cause affects other string fields that are sliced from
field_data[], such asquery.2. End-to-end reproduction through
uvicorn-> ASGIscopeIn a local environment with
uvicorn 0.42.0,HttpToolsProtocol.on_headers_complete()still does:This minimal app shows the truncated
scope["path"]:Observed output:
So this is not limited to direct callers of
parse_url(): whenuvicornruns withhttp="httptools", the truncated parse result reaches application code viascope["path"],scope["raw_path"], andscope["query_string"].Root cause
Struct layout
The Cython declaration mirrors the vendored
http-parserstruct:httptools/parser/url_cparser.pxdand in
vendor/http-parser/http_parser.h:Slicing in
url_parser.pyxhttptools/parser/url_parser.pyxthen trusts those values:The Cython code is not the source of the truncation; the wrapped values are already present in the C struct before slicing happens.
Affected string fields sliced from
field_data[]include:schemahostpathqueryfragmentuserinfo(
portis also auint16_t, but it is semantically different and not sliced fromfield_data[].)Upstream status
nodejs/http-parserwas archived on 2022-11-06 and is now read-only.nodejs/http-parser#481("http_parser_parse_url fails to handle very long URLs") is still OPEN.nodejs/http-parser#480("Fix http_parser_parse_url to handle very long URLs") is still OPEN.So while the original bug was correctly identified as upstream in 2020, that upstream no longer has a realistic maintenance path.
This repo's own history
PR
#56/ merge commit63b5de2The commit that switched request parsing from
http-parsertollhttpexplicitly left URL parsing behind:That merge commit is:
63b5de2bf7c7fa498c7ecac1eb9d8f7352a00dd42021-03-30T17:05:59ZURL parser file history on
masterFrom a fresh clone of this repository:
These two files were introduced in
63b5de2and have not changed onmastersince then.Two unmerged replacement branches still present in
origin/Both are visible in
git branch -aon the current repository:origin/rust-urlorigin/rust-uriTheir commits are:
04d7df7origin/rust-urlLocal draft294ac41origin/rust-uriWIP: use the http::Uri crateBoth delete
httptools/parser/url_cparser.pxdandhttptools/parser/url_parser.pyxand replace them with a Rust-based implementation.FastAPI / Uvicorn propagation path
This is not limited to applications that explicitly and deliberately choose
httptools.FastAPI's current documentation recommends installing FastAPI with:
pip install "fastapi[standard]"And FastAPI's docs explicitly state that installing
fastapi[standard]also gives youuvicorn[standard].Uvicorn's own installation docs explicitly state that:
uvicorn[standard]installshttptoolshttptoolsis installed, Uvicorn uses it by default for HTTP/1.1 parsingSo the documented "standard" path is:
This does not mean every FastAPI deployment is affected in practice:
fastapih11But it does mean this parser is part of the default path for a large set of users following the official FastAPI + Uvicorn "standard" installation flow.
Impact
I am not claiming that every application using
httptoolsoruvicornis automatically exploitable.I am claiming that the following primitive exists:
That primitive can matter in contexts such as:
This is exactly the kind of risk called out in
nodejs/http-parser#481, which notes that the parsed and actual path may differ in security-relevant code paths.Suggested paths forward
Any of these would be better than the current silent truncation:
A. Deprecate
parse_url()and document the limitationPoint users to
urllib.parse.urlsplitor another maintained parser.B. Patch the vendored
http-parserCarry a local fix in the vendored copy so that oversized
off/lenvalues do not silently wrap.This is likely a small source change, but it should be evaluated carefully for:
use-system-http-parsercompatibilityC. Reject overflow explicitly
If changing the struct layout is undesirable, another improvement would be to detect overflow and raise an error instead of returning truncated fields.
D. Resume a replacement implementation
The existing
origin/rust-url/origin/rust-uribranches suggest that this migration path was already explored.Environment used for reproduction
httptools 0.7.1from PyPImasterchecked out at28d1db15eaeaab5bc7d376d2c2035d966b6e1378(0.8.0.dev0)uvicorn 0.42.0for the end-to-end ASGI reproductionThe bug is not Windows-specific; the relevant behavior follows from the struct layout and parsing code.
Thanks
httptoolshas been load-bearing for the Python ASGI ecosystem for a long time. This report is meant to be actionable and specific: there is a concrete reproducible bug, the historical upstream escalation path is closed, and the current repository history suggests this exact corner was already recognized as unfinished whenllhttpreplacedhttp-parserfor request parsing.