[GHSA-wh4c-j3r5-mjhp] xmldom: XML injection via unsafe CDATA serialization allows attacker-controlled markup insertion#7431
Open
karfau wants to merge 1 commit intokarfau/advisory-improvement-7431from
Conversation
Collaborator
|
Hi there @karfau! A community member has suggested an improvement to your security advisory. If approved, this change will affect the global advisory listed at github.com/advisories. It will not affect the version listed in your project repository. This change will be reviewed by our Security Curation Team. If you have thoughts or feedback, please share them in a comment here! If this PR has already been closed, you can start a new community contribution for this advisory |
karfau
commented
Apr 18, 2026
| ], | ||
| "summary": "xmldom: XML injection via unsafe CDATA serialization allows attacker-controlled markup insertion", | ||
| "details": "## Summary\n\n`@xmldom/xmldom` allows attacker-controlled strings containing the CDATA terminator `]]>` to be inserted into a `CDATASection` node. During serialization, `XMLSerializer` emitted the CDATA content verbatim without rejecting or safely splitting the terminator. As a result, data intended to remain text-only became **active XML markup** in the serialized output, enabling XML structure\ninjection and downstream business-logic manipulation.\n\nThe sequence `]]>` is not allowed inside CDATA content and must be rejected or safely handled during serialization. ([MDN Web Docs](https://developer.mozilla.org/))\n\n### Attack surface\n\n`Document.createCDATASection(data)` is the most direct entry point, but it is not the only one. The WHATWG DOM spec intentionally does not validate `]]>` in mutation methods — only `createCDATASection` carries that guard. The following paths therefore also allow `]]>` to enter a CDATASection node and reach the serializer:\n\n- `CharacterData.appendData()`\n- `CharacterData.replaceData()`\n- `CharacterData.insertData()`\n- Direct assignment to `.data`\n- Direct assignment to `.textContent`\n\n(Note: assigning to `.nodeValue` does **not** update `.data` in this implementation — the serializer reads `.data` directly — so `.nodeValue` is not an exploitable path.)\n\n### Parse path\n\nParsing XML that contains a CDATA section is **not** affected. The SAX parser's non-greedy `CDSect` regex stops at the first `]]>`, so parsed CDATA data never contains the terminator.\n\n---\n\n## Impact\n\nIf an application uses `xmldom` to generate \"trusted\" XML documents that embed **untrusted user input** inside CDATA (a common pattern in exports, feeds, SOAP/XML integrations, etc.), an attacker can inject additional XML elements/attributes into the generated document.\n\nThis can lead to:\n\n- Integrity violation of generated XML documents.\n- Business-logic injection in downstream consumers (e.g., injecting `<approved>true</approved>`, `<role>admin</role>`, workflow flags, or other security-relevant elements).\n- Unexpected privilege/workflow decisions if downstream logic assumes injected nodes cannot appear.\n\nThis issue does **not** require malformed parsers or browser behavior; it is caused by serialization producing attacker-influenced XML markup.\n\n---\n\n## Root Cause (with file + line numbers)\n\n**File:** `lib/dom.js`\n\n### 1. No validation in `createCDATASection`\n\n`createCDATASection: function (data)` accepts any string and appends it directly.\n\n- **Lines 2216–2221** (0.9.8)\n\n### 2. Unsafe CDATA serialization\n\nSerializer prints CDATA sections as:\n\n```\n<![CDATA[ + node.data + ]]>\n```\n\nwithout handling `]]>` in the data.\n\n- **Lines 2919–2920** (0.9.8)\n\nBecause CDATA content is emitted verbatim, an embedded `]]>` closes the CDATA section early and the remainder of the attacker-controlled payload is interpreted as markup in the serialized XML.\n\n---\n\n## Proof of Concept — Fix A: `createCDATASection` now throws\n\nOn patched versions, passing `]]>` directly to `createCDATASection` throws `InvalidCharacterError` instead of silently accepting the payload:\n\n```js\nconst { DOMImplementation } = require('./lib');\n\nconst doc = new DOMImplementation().createDocument(null, 'root', null);\ntry {\n doc.createCDATASection('SAFE]]><injected attr=\"pwn\"/>');\n console.log('VULNERABLE — no error thrown');\n} catch (e) {\n console.log('FIXED — threw:', e.name); // InvalidCharacterError\n}\n```\n\nExpected output on patched versions:\n\n```\nFIXED — threw: InvalidCharacterError\n```\n\n---\n\n## Proof of Concept — Fix B: mutation vector now safe\n\nOn patched versions, injecting `]]>` via a mutation method (`appendData`, `replaceData`, `.data =`, `.textContent =`) no longer produces injectable output. The serializer splits the terminator so the result round-trips as safe text:\n\n```js\nconst { DOMImplementation, XMLSerializer } = require('./lib');\nconst { DOMParser } = require('./lib');\n\nconst doc = new DOMImplementation().createDocument(null, 'root', null);\n\n// Start with safe data, then mutate to include the terminator\nconst cdata = doc.createCDATASection('safe');\ndoc.documentElement.appendChild(cdata);\ncdata.appendData(']]><injected attr=\"pwn\"/><more>TEXT</more><![CDATA[');\n\nconst out = new XMLSerializer().serializeToString(doc);\nconsole.log('Serialized:', out);\n\nconst reparsed = new DOMParser().parseFromString(out, 'text/xml');\nconst injected = reparsed.getElementsByTagName('injected').length > 0;\nconsole.log('Injected element found in reparsed doc:', injected);\n// VULNERABLE: true | FIXED: false\n```\n\nExpected output on patched versions:\n\n```\nSerialized: <root><![CDATA[safe]]]]><![CDATA[><injected attr=\"pwn\"/><more>TEXT</more><![CDATA[]]></root>\nInjected element found in reparsed doc: false\n```\n\n---\n\n## Fix Applied\n\nBoth mitigations were implemented:\n\n### Option A — Strict/spec-aligned: reject `]]>` in `createCDATASection()`\n\n`Document.createCDATASection(data)` now throws `InvalidCharacterError` (per the [WHATWG DOM spec](https://dom.spec.whatwg.org/#dom-document-createcdatasection)) when `data` contains `]]>`. This closes the direct entry point.\n\nCode that previously passed a string containing `]]>` to `createCDATASection` and relied on the silent/unsafe behaviour will now receive `InvalidCharacterError`. Use a mutation method such as `appendData` if you intentionally need `]]>` in a CDATASection node's data (the serializer split in Option B will keep the output safe).\n\n### Option B — Defensive serialization: split the terminator during serialization\n\n`XMLSerializer` now replaces every occurrence of `]]>` in CDATA section data with the split sequence `]]]]><![CDATA[>` before emitting. This closes all mutation-vector paths that Option A alone cannot guard, and means the serialized output is always well-formed XML regardless of how `]]>` entered the node.", | ||
| "details": "## Summary\n\n`@xmldom/xmldom` allows attacker-controlled strings containing the CDATA terminator `]]>` to be inserted into a `CDATASection` node. During serialization, `XMLSerializer` emitted the CDATA content verbatim without rejecting or safely splitting the terminator. As a result, data intended to remain text-only became **active XML markup** in the serialized output, enabling XML structure\ninjection and downstream business-logic manipulation.\n\nThe sequence `]]>` is not allowed inside CDATA content and must be rejected or safely handled during serialization. ([MDN Web Docs](https://developer.mozilla.org/))\n\n### Attack surface\n\n`Document.createCDATASection(data)` is the most direct entry point, but it is not the only one. The WHATWG DOM spec intentionally does not validate `]]>` in mutation methods — only `createCDATASection` carries that guard. The following paths therefore also allow `]]>` to enter a CDATASection node and reach the serializer:\n\n- `CharacterData.appendData()`\n- `CharacterData.replaceData()`\n- `CharacterData.insertData()`\n- Direct assignment to `.data`\n- Direct assignment to `.textContent`\n\n(Note: assigning to `.nodeValue` does **not** update `.data` in this implementation — the serializer reads `.data` directly — so `.nodeValue` is not an exploitable path.)\n\n### Parse path\n\nParsing XML that contains a CDATA section is **not** affected. The SAX parser's non-greedy `CDSect` regex stops at the first `]]>`, so parsed CDATA data never contains the terminator.\n\n---\n\n## Impact\n\nIf an application uses `xmldom` to generate \"trusted\" XML documents that embed **untrusted user input** inside CDATA (a common pattern in exports, feeds, SOAP/XML integrations, etc.), an attacker can inject additional XML elements/attributes into the generated document.\n\nThis can lead to:\n\n- Integrity violation of generated XML documents.\n- Business-logic injection in downstream consumers (e.g., injecting `<approved>true</approved>`, `<role>admin</role>`, workflow flags, or other security-relevant elements).\n- Unexpected privilege/workflow decisions if downstream logic assumes injected nodes cannot appear.\n\nThis issue does **not** require malformed parsers or browser behavior; it is caused by serialization producing attacker-influenced XML markup.\n\n---\n\n## Root Cause (with file + line numbers)\n\n**File:** `lib/dom.js`\n\n### 1. No validation in `createCDATASection`\n\n`createCDATASection: function (data)` accepts any string and appends it directly.\n\n- **Lines 2216–2221** (0.9.8)\n\n### 2. Unsafe CDATA serialization\n\nSerializer prints CDATA sections as:\n\n```\n<![CDATA[ + node.data + ]]>\n```\n\nwithout handling `]]>` in the data.\n\n- **Lines 2919–2920** (0.9.8)\n\nBecause CDATA content is emitted verbatim, an embedded `]]>` closes the CDATA section early and the remainder of the attacker-controlled payload is interpreted as markup in the serialized XML.\n\n---\n\n## Proof of Concept — Fix A: `createCDATASection` now throws\n\nOn patched versions, passing `]]>` directly to `createCDATASection` throws `InvalidCharacterError` instead of silently accepting the payload:\n\n```js\nconst { DOMImplementation } = require('./lib');\n\nconst doc = new DOMImplementation().createDocument(null, 'root', null);\ntry {\n doc.createCDATASection('SAFE]]><injected attr=\"pwn\"/>');\n console.log('VULNERABLE — no error thrown');\n} catch (e) {\n console.log('FIXED — threw:', e.name); // InvalidCharacterError\n}\n```\n\nExpected output on patched versions:\n\n```\nFIXED — threw: InvalidCharacterError\n```\n\n---\n\n## Proof of Concept — Fix B: mutation vector now safe\n\nOn patched versions, injecting `]]>` via a mutation method (`appendData`, `replaceData`, `.data =`, `.textContent =`) no longer produces injectable output. The serializer splits the terminator so the result round-trips as safe text:\n\n```js\nconst { DOMImplementation, XMLSerializer } = require('./lib');\nconst { DOMParser } = require('./lib');\n\nconst doc = new DOMImplementation().createDocument(null, 'root', null);\n\n// Start with safe data, then mutate to include the terminator\nconst cdata = doc.createCDATASection('safe');\ndoc.documentElement.appendChild(cdata);\ncdata.appendData(']]><injected attr=\"pwn\"/><more>TEXT</more><![CDATA[');\n\nconst out = new XMLSerializer().serializeToString(doc);\nconsole.log('Serialized:', out);\n\nconst reparsed = new DOMParser().parseFromString(out, 'text/xml');\nconst injected = reparsed.getElementsByTagName('injected').length > 0;\nconsole.log('Injected element found in reparsed doc:', injected);\n// VULNERABLE: true | FIXED: false\n```\n\nExpected output on patched versions:\n\n```\nSerialized: <root><![CDATA[safe]]]]><![CDATA[><injected attr=\"pwn\"/><more>TEXT</more><![CDATA[]]></root>\nInjected element found in reparsed doc: false\n```\n\n---\n\n## Fix Applied\n\nBoth mitigations were implemented:\n\n### Option A — Strict/spec-aligned: reject `]]>` in `createCDATASection()`\n\n`Document.createCDATASection(data)` now throws `InvalidCharacterError` (per the [WHATWG DOM spec](https://dom.spec.whatwg.org/#dom-document-createcdatasection)) when `data` contains `]]>`. This closes the direct entry point.\n\nCode that previously passed a string containing `]]>` to `createCDATASection` and relied on the silent/unsafe behaviour will now receive `InvalidCharacterError`. Use a mutation method such as `appendData` if you intentionally need `]]>` in a CDATASection node's data (the serializer split in Option B will keep the output safe).\n\n### Option B — Defensive serialization: split the terminator during serialization\n\n`XMLSerializer` now replaces every occurrence of `]]>` in CDATA section data with the split sequence `]]]]><![CDATA[>` before emitting. This closes all mutation-vector paths that Option A alone cannot guard, and means the serialized output is always well-formed XML regardless of how `]]>` entered the node.\n\n## Update — 2026-04-xx (0.9.10 / 0.8.13)\n\n### `splitCDATASections` is deprecated\n\nThe CDATA split behavior introduced as Option B of this fix (replacing `]]>` with\n`]]]]><![CDATA[>` during serialization) is **deprecated** as of 0.9.10 / 0.8.13.\n\nThis release introduces a `requireWellFormed` option on `XMLSerializer.serializeToString()`.\nWhen `{ requireWellFormed: true }` is passed as the second argument, the serializer throws\n`InvalidStateError` if CDATA section data contains `]]>` — this is the spec-aligned behavior\n(W3C DOM Parsing and Serialization, `require well-formed` flag) and the recommended migration\npath going forward.\n\nThe split behavior is now controlled by an explicit `splitCDATASections` option (default\n`true`, preserving the current behavior). The three serialization behaviors are:\n\n| `requireWellFormed` | `splitCDATASections` | Behavior |\n|---|---|---|\n| `false` (default) | `true` (default) | Split `]]>` → `]]]]><![CDATA[>` (current behavior, deprecated) |\n| `true` | — (ignored) | Throw `InvalidStateError` — spec-aligned, recommended |\n| `false` | `false` | Emit verbatim — same as pre-0.9.9 behavior |\n\n`requireWellFormed: true` takes precedence: the split path is unreachable when it is set.\n\n### Migration\n\nReplace any reliance on the default split behavior with an explicit opt-in:\n\n```js\n// Before (implicit split, deprecated):\nconst xml = new XMLSerializer().serializeToString(doc);\n\n// After (explicit guard, spec-aligned):\nconst xml = new XMLSerializer().serializeToString(doc, { requireWellFormed: true });\n// Throws InvalidStateError if any CDATASection contains ']]>'\n```\n\n### Removal timeline\n\nBoth the `splitCDATASections` option and the underlying `]]>` → `]]]]><.\n", |
Author
There was a problem hiding this comment.
sorry, 2026-04-xx needs to be replaced with 2026-04-18
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Updates
Comments
issues with the resolution discovered during research of follow up advisories lead to the reevaluation of the approach for future versions