Whatever message this page gives is out now! Go check it out!
Threat | Attack surface | Primary defence |
|---|---|---|
Path traversal | Filesystem paths | Controlled document root; sanitised Content-Disposition filenames. |
XXE | XML and feed parsers | External entity resolution disabled in parsers. |
SSRF | URL loading | HTTP(S) only; RFC1918/loopback/metadata IPs blocked; safe redirect handling. |
ReDoS | includePatterns / excludePatterns | Safe regex matcher; dangerous patterns rejected or timed out. |
ZIP bomb | ZIP archive parsing | maxEntries and maxDepth caps; per-thread depth tracking. |
Resource exhaustion | File size, thread count | maxFileSize, maxContentSize, maxThreads; non-positive values rejected. |
../, ..\) in configured paths can escape intended directories and read sensitive files. Tests append traversal segments under a controlled document root and expect no system file content in results. For URL loads, Content-Disposition filenames are sanitised (path stripped, .. neutralised) so metadata does not carry traversable paths./etc/passwd or SAM content in docs[1].text./../../../etc/passwd) and Windows (/..\..\..\..\windows\system32\config\sam) traversal sequences are tested... segments neutralised — before being written into document metadata.// Appended to application document dir — must not return /etc/passwd or SAM content
traversalPaths = [
"/../../../etc/passwd",
"/..\\..\\..\\..\\windows\\system32\\config\\sam"
];
for (tPath in traversalPaths) {
docs = docService.load({ path: application.getDocumentsDir() & tPath });
// Assert no leaked system content in docs[1].text
}load()./etc/passwd). Legitimate XML without XXE must still parse.parserType: "xml" and parserConfigs must return parsed content:docs = docService.load({
path: expandPath("./Documents/valid.xml"),
parserType: "xml",
parserConfigs: {
xml: {
includeAttributes: true,
maxDepth: 5
}
}
});root:, /bin/bash, or similar content into docs[1].text, and must not leak them through error messages. The test accepts either safe parsing or a controlled failure — not file disclosure..xml extension. Never enable external entity resolution on parsers that process untrusted input.169.254.169.254) and steal credentials.file://, ftp://, etc.) are rejected.10.x.x.x, 172.16–31.x.x, 192.168.x.x).127.0.0.1, ::1, localhost).169.254.169.254 and equivalents).docService = documentService();
docs = docService.load({
path: "https://www.example.com/robots.txt"
});(a+)+) so that matching a URL string hangs the worker. The URL loader is expected to run patterns through a safe matcher that rejects dangerous patterns and handles malformed regex without crashing. Safe patterns must still match when appropriate..*robots\.txt must match the target URL and allow the load to complete:docs = docService.load({
path: "https://www.example.com/robots.txt",
includePatterns: [".*robots\.txt"]
});(a+)+ must complete quickly. The URL must not load full content as if the pattern were allowed:docs = docService.load({
path: "https://www.example.com/robots.txt",
includePatterns: ["(a+)+"]
});docs = docService.load({
path: "https://www.example.com/robots.txt",
excludePatterns: ["([a-z]+)+$"]
});Pattern type | Key | Expectation |
|---|---|---|
Safe include | includePatterns | Matches target URL; load completes normally. |
Dangerous include | includePatterns | Completes quickly; URL must not load as if allowed. |
Dangerous exclude | excludePatterns | Completes quickly; does not hang the worker. |
includePatterns and excludePatterns apply to URL loading. They are not applied to filesystem paths. Use glob-style pattern on the path option for filesystem filtering.parserConfigs.zip supports limits such as maxEntries and maxDepth. Exceeding limits should yield errors, empty results, or failure documents — not unbounded expansion. Concurrent parsing should use per-thread depth tracking.maxEntries: 1 limits the parser to a single ZIP entry. Archives with more entries must not expand beyond the cap:docs = docService.load({
path: zipPath,
parserConfigs: {
zip: { maxEntries: 1 }
}
});maxDepth: 1 prevents the parser from recursing into zip-within-zip structures beyond one level:docs = docService.load({
path: outerZipPath,
parserConfigs: {
zip: { maxDepth: 1 }
}
});Option | Scope | Behaviour when exceeded |
|---|---|---|
maxEntries | Number of ZIP entries | Parser stops processing at the entry limit. Excess entries are not expanded. |
maxDepth | Zip-within-zip nesting depth | Parser does not recurse beyond the specified depth. Nested archives at deeper levels are not opened. |
maxEntries and maxDepth when processing ZIP files from untrusted sources.parserConfigs.xml.maxContentSize to reject oversized XML, maxFileSize on load() for both file and URL sources, and verify a default maximum (for example 100 MB) so "unlimited by omission" does not occur.maxContentSize):docs = docService.load({
path: xmlFile,
parserType: "xml",
parserConfigs: {
xml: { maxContentSize: 2048 }
}
});maxFileSize):docs = docService.load({
path: "https://www.example.com/robots.txt",
maxFileSize: 10
});maxThreads capped (for example at 64), validation that non-positive maxThreads or maxFileSize is rejected, and that defaults are real limits (setting maxFileSize: 1 on a normal file must fail or return a failure document, proving the default is not "unlimited").parallelDocs = docService.load({
path: tempDir,
parallel: true
});
// Explicit thread count — very large values may be capped
cappedDocs = docService.load({
path: tempDir,
parallel: true,
maxThreads: 1000
});Option | Applies to | Notes |
|---|---|---|
maxFileSize | File and URL loads | Upper bound on file or response size in bytes. Non-positive values are rejected. Setting maxFileSize: 1 on a normal file must fail or return a failure document. |
parserConfigs.xml.maxContentSize | XML parser | Upper bound on extracted XML content size. Oversized documents are rejected before the content is written into memory. |
parallel | Directory loads | When true, files are loaded using multiple threads. Use with maxThreads to control concurrency. |
maxThreads | Directory loads | Maximum number of threads for parallel loading. Very large values may be capped (e.g. at 64). Non-positive values are rejected. |
maxThreads or maxFileSize are rejected at configuration time. Always provide positive integers. A default maximum exists for maxFileSize (e.g. 100 MB) — omitting the option does not mean unlimited.maxFileSize does not limit thread count, and a low maxThreads does not limit individual file size. Configure both when loading from untrusted sources at scale.