Security Best Practices
This document provides security guidance for using formatparse in production environments.
Input Validation
Always validate and sanitize user input before passing it to formatparse:
import re
from formatparse import parse
# Validate pattern length
def safe_parse(pattern, text):
# Stricter than the library maximum (10,000 characters for patterns); defense in depth.
if len(pattern) > 1000: # Set your own limit
raise ValueError("Pattern too long")
if len(text) > 1_000_000: # Set your own limit
raise ValueError("Input too long")
# Check for suspicious characters
if '\0' in pattern or '\0' in text:
raise ValueError("Null bytes not allowed")
return parse(pattern, text)
Pattern Complexity
Be cautious when parsing patterns from untrusted sources:
# Safe: Patterns from trusted sources
pattern = "{name}: {age:d}"
result = parse(pattern, "Alice: 30")
# Caution: Patterns from user input
user_pattern = get_user_input() # Validate before use
if is_suspicious_pattern(user_pattern):
raise ValueError("Pattern not allowed")
result = parse(user_pattern, user_text)
The library includes built-in limits to help protect against malicious patterns:
Maximum pattern length: 10,000 characters
Maximum input length: 10MB
Maximum fields: 100
After each successful regex compilation, the library checks that compilation took at most 500ms of wall-clock time (this does not bound matching time; see below)
Regular Expression Denial of Service (ReDoS)
Compilation vs matching: The 500ms check runs only after regex compilation completes. It does not interrupt compilation in progress, and it does not bound matching time for parse, search, findall, or similar calls. Use application-level timeouts and simple patterns for untrusted input.
``findall``: By default there is no fixed cap on how many matches can be returned within the maximum input length. Pass max_matches to formatparse.findall() or formatparse.findall_iter() (or FormatParser.findall_iter()) to stop after a given number of non-overlapping matches when inputs are untrusted.
The library still enforces pattern and input size limits. You should still:
Validate patterns from untrusted sources: Check pattern complexity before use
Monitor parsing performance: Watch for unusually slow parsing operations
Set your own timeouts: If parsing takes too long in your application, implement your own timeout
import signal
from formatparse import parse
class TimeoutError(Exception):
pass
def timeout_handler(signum, frame):
raise TimeoutError("Parsing timeout")
def safe_parse_with_timeout(pattern, text, timeout=1.0):
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(int(timeout))
try:
result = parse(pattern, text)
signal.alarm(0) # Cancel alarm
return result
except TimeoutError:
signal.alarm(0)
raise ValueError("Parsing took too long")
Resource Limits
The library enforces the following limits:
Pattern length: 10,000 characters maximum
Input length: 10,000,000 characters (10MB) maximum
Field count: 100 fields maximum
Field name length: 200 characters maximum
Post-compile timing check: After each successful regex compilation, the library rejects the build if compilation took more than 500ms of wall-clock time (this is not a match-time or interruptible compile timeout; see the ReDoS section above)
If you need different limits for your use case, consider:
Pre-validating inputs in your application
Processing large inputs in chunks
Using streaming parsers for very large inputs
Error Handling
Always handle errors appropriately:
from formatparse import parse
try:
result = parse(pattern, text)
if result is None:
# Pattern didn't match
handle_no_match()
else:
process_result(result)
except ValueError as e:
# Invalid pattern or input
log_error(e)
handle_error()
except Exception as e:
# Unexpected error
log_unexpected_error(e)
handle_error()
Dependencies
Keep your dependencies up to date:
# Check for Rust vulnerabilities
cargo audit
# Check for Python vulnerabilities
pip-audit
# Update dependencies regularly
pip install --upgrade formatparse
Performance Considerations
For best performance and security:
Pre-compile patterns: Use
compile()for repeated patternsCache compiled patterns: Reuse
FormatParserinstancesMonitor resource usage: Watch memory and CPU usage
Set appropriate timeouts: Don’t let parsing operations hang indefinitely
from formatparse import compile
# Compile once, use many times
parser = compile("{name}: {age:d}")
for text in many_texts:
result = parser.parse(text)
process_result(result)
Known Limitations
Very large inputs may consume significant memory
Complex patterns may take longer to compile
Some edge cases in pattern syntax may not be fully validated
Reporting Security Issues
If you discover a security vulnerability, please report it privately:
Email: odosmatthews@gmail.com
Do not open public GitHub issues for security vulnerabilities
See the full policy in the repository: Security policy (SECURITY.md)