Add support for string, binary, and base64 payload sanitization with tests#3
Conversation
WalkthroughDie FileSanitizer-Klasse wird um drei neue öffentliche Methoden erweitert, um Roh-Payloads, Binärdaten und Base64-/Data-URI-Eingaben direkt zu verarbeiten. Ein interner Orchestrator normalisiert Eingaben, erstellt temporäre Dateien, erkennt MIME-Typen automatisch, delegiert an die bestehende Verarbeitung und räumt Ressourcen auf. Umfangreiche Tests validieren HTML-Sanitisierung, MIME-Erkennung und Base64-Handling. ChangesNeue Eingabe-Verarbeitungs-Entry-Points
🎯 3 (Moderate) | ⏱️ ~25 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/Sanitizer/FileSanitizerInputStringTest.php (1)
31-31: ⚡ Quick winOptional: Explizite Base64-Validierung für klarere Fehlermeldungen.
Die aktuelle Assertion funktioniert korrekt, aber wenn
sanitizedBase64ungültiges Base64 enthielte, würdebase64_decode(..., true)falsezurückgeben und die Fehlermeldung wäre nicht optimal. Eine explizite Prüfung könnte das Debugging erleichtern.💡 Optionale Verbesserung für klarere Assertions
- self::assertSame($result['sanitizedData'], base64_decode($result['sanitizedBase64'], true)); + $decoded = base64_decode($result['sanitizedBase64'], true); + self::assertNotFalse($decoded, 'sanitizedBase64 sollte gültiges Base64 enthalten'); + self::assertSame($result['sanitizedData'], $decoded);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/Sanitizer/FileSanitizerInputStringTest.php` at line 31, The test currently compares sanitizedData to base64_decode($result['sanitizedBase64'], true) directly, which yields an unhelpful failure if sanitizedBase64 is invalid; first assert the base64 is valid by calling base64_decode($result['sanitizedBase64'], true) and using an explicit assertion like assertNotFalse (or assertIsString) on that decoded value keyed by 'sanitizedBase64', then perform the assertSame between $result['sanitizedData'] and the decoded value so failures clearly indicate invalid Base64 versus content mismatch.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/FileSanitizer.php`:
- Around line 188-195: The temporary working directory created in
FileSanitizer.php using $tempDirectory is currently created with mode 0755;
change the mkdir invocation to use 0700 to restrict access (so uploads and
artifacts are not world-readable) and, to be extra safe, after successful
creation call chmod($tempDirectory, 0700) to enforce the permission on systems
where umask may interfere; update the mkdir line and add the chmod fallback near
the existing creation/check block (the try/catch that sets $tempDirectory and
the if (!mkdir(...)) check).
- Around line 134-137: defaultOutputPath currently mishandles names without
extensions because it always uses substr with a negative length and produces
".sanitized" instead of appending to the original name; update the function
defaultOutputPath to check if $extension === '' and in that case return
$inputPath . '.sanitized', otherwise compute the base name (using
substr($inputPath, 0, -strlen($extension) - 1)) and return base . '.sanitized.'
. $extension so dateiendungslose und normale Dateien are treated correctly.
- Around line 199-214: Behandle Data-URIs flexibler: in
extractBase64Payload(string $base64Data) prüfe zunächst, ob der String ein
Data-URI ist, indem du bis zum ersten Komma splittest; falls im Header (Teil vor
dem Komma) case-insensitiv das Flag ";base64" vorkommt, gebe den Teil nach dem
Komma (ohne Whitespace) zurück, sonst behandle den gesamten Input als rohes
Base64 wie bisher; in extractDataUriMimeType(string $base64Data) parse ebenfalls
nur den Header bis zum ersten Komma, entferne das führende "data:" und
extrahiere den MIME-Type als Text vor dem ersten Semikolon (oder null, wenn leer
oder Header fehlt), normalisiere mit strtolower/trim und gib null zurück, wenn
kein Data-URI vorliegt oder kein MIME-Type angegeben wurde; referenziere die
Funktionen extractBase64Payload und extractDataUriMimeType beim Patch.
---
Nitpick comments:
In `@tests/Sanitizer/FileSanitizerInputStringTest.php`:
- Line 31: The test currently compares sanitizedData to
base64_decode($result['sanitizedBase64'], true) directly, which yields an
unhelpful failure if sanitizedBase64 is invalid; first assert the base64 is
valid by calling base64_decode($result['sanitizedBase64'], true) and using an
explicit assertion like assertNotFalse (or assertIsString) on that decoded value
keyed by 'sanitizedBase64', then perform the assertSame between
$result['sanitizedData'] and the decoded value so failures clearly indicate
invalid Base64 versus content mismatch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c0886b4a-e8aa-4d88-a492-b1b76073bc0d
📒 Files selected for processing (3)
README.mdsrc/FileSanitizer.phptests/Sanitizer/FileSanitizerInputStringTest.php
Summary by CodeRabbit
Neue Features
Dokumentation
Tests