PDF-PyCrack includes a comprehensive benchmarking system to measure and optimize performance. This guide covers how to use the benchmarking tools, interpret results, and improve your cracking performance.
The benchmarking system provides:
The easiest way to measure performance is using the standard benchmark:
# Run the default benchmark
uv run python benchmark/benchmark.py --standard
This runs a standardized test that:
You can also run custom benchmarks with specific parameters:
# Quick test with numbers only
uv run python benchmark/benchmark.py --min-len 1 --max-len 3 --charset 0123456789
# Test with letters
uv run python benchmark/benchmark.py --pdf tests/test_pdfs/letters/ab.pdf --min-len 1 --max-len 2 --charset abcdefghijklmnopqrstuvwxyz
# Test with custom configuration
uv run python benchmark/benchmark.py --processes 4 --batch-size 200
When you run a benchmark, you’ll see output like this:
============================================================
Starting benchmark
============================================================
PDF: tests/test_pdfs/numbers/100.pdf
Charset: 0123456789
Length range: 4-5
Search space: 110,000 passwords
Processes: 8
Batch size: 100
Description: Standard benchmark - numbers 4-5 length
Cracking PDF: 100%|████████████████| 110000/110000 [00:15<00:00, 7234.56pw/s]
✓ Benchmark completed (password not found as expected)
------------------------------------------------------------
Benchmark Results:
Total passwords checked: 110,000
Elapsed time: 15.21s
CPU time: 0.15s
Passwords per second: 7,235
Efficiency: 98.5%
============================================================
Results saved to: benchmark/results/benchmark_20250810_142305.json
Results appended to: benchmark/results/benchmark_history.csv
This is the primary performance metric:
Percentage of CPU time actually spent on cracking vs. overhead:
uv run python benchmark/benchmark.py [OPTIONS]
Options:
--pdf PATH Path to PDF file
--min-len INT Minimum password length
--max-len INT Maximum password length
--charset STR Character set to use
--processes INT Number of processes
--batch-size INT Batch size for workers
--standard Run standard benchmark configuration
The standard benchmark uses:
tests/test_pdfs/numbers/100.pdf
(password: “100”)0123456789
)This configuration ensures: ✅ Quick completion for regular testing ✅ Consistent results across runs ✅ Meaningful data for optimization
All benchmark results are saved and can be tracked over time:
# View historical results
cat benchmark/results/benchmark_history.csv
# Plot performance trends (if you have plotting tools)
python -c "
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('benchmark/results/benchmark_history.csv')
df['timestamp'] = pd.to_datetime(df['timestamp'])
plt.plot(df['timestamp'], df['passwords_per_second'])
plt.title('Performance Over Time')
plt.ylabel('Passwords per Second')
plt.show()
"
You can benchmark different configurations to find optimal settings:
# Test different process counts
for processes in 2 4 6 8; do
echo "Testing $processes processes..."
uv run python benchmark/benchmark.py --processes $processes --standard
done
# Test different batch sizes
for batch in 50 100 200 500; do
echo "Testing batch size $batch..."
uv run python benchmark/benchmark.py --batch-size $batch --standard
done
Compare performance across different systems:
# Generate system fingerprint
echo "System: $(uname -a)" > system_benchmark.txt
echo "CPU: $(lscpu | grep 'Model name')" >> system_benchmark.txt
echo "Memory: $(free -h | grep Mem)" >> system_benchmark.txt
# Run benchmark
uv run python benchmark/benchmark.py --standard >> system_benchmark.txt
Too few processes underutilize CPU, too many create overhead:
#!/usr/bin/env python3
"""Find optimal process count for your system."""
import subprocess
import json
import multiprocessing
def benchmark_processes():
max_processes = multiprocessing.cpu_count()
results = {}
for processes in range(1, max_processes + 1):
print(f"Testing {processes} processes...")
cmd = [
"uv", "run", "python", "benchmark/benchmark.py",
"--processes", str(processes),
"--min-len", "4", "--max-len", "4", # Quick test
"--charset", "0123456789"
]
result = subprocess.run(cmd, capture_output=True, text=True)
# Parse results (simplified - you'd want better parsing)
if "Passwords per second:" in result.stdout:
lines = result.stdout.split('\n')
for line in lines:
if "Passwords per second:" in line:
rate = int(line.split(':')[1].strip().replace(',', ''))
results[processes] = rate
break
# Find optimal
optimal = max(results.items(), key=lambda x: x[1])
print(f"\nOptimal configuration: {optimal[0]} processes")
print(f"Performance: {optimal[1]:,} passwords/second")
return results
if __name__ == "__main__":
results = benchmark_processes()
Find the optimal batch size for your workload:
def benchmark_batch_sizes():
batch_sizes = [10, 25, 50, 100, 200, 500, 1000]
results = {}
for batch_size in batch_sizes:
print(f"Testing batch size {batch_size}...")
cmd = [
"uv", "run", "python", "benchmark/benchmark.py",
"--batch-size", str(batch_size),
"--standard"
]
# Run and parse results
# (Implementation details omitted for brevity)
return results
Monitor memory usage during benchmarks:
import psutil
import time
import subprocess
import threading
class MemoryProfiler:
def __init__(self):
self.max_memory = 0
self.monitoring = False
def start_monitoring(self):
self.monitoring = True
self.monitor_thread = threading.Thread(target=self._monitor)
self.monitor_thread.start()
def stop_monitoring(self):
self.monitoring = False
self.monitor_thread.join()
return self.max_memory
def _monitor(self):
while self.monitoring:
memory = psutil.virtual_memory().used / (1024**3) # GB
self.max_memory = max(self.max_memory, memory)
time.sleep(0.1)
def profile_memory_usage():
profiler = MemoryProfiler()
profiler.start_monitoring()
# Run benchmark
subprocess.run([
"uv", "run", "python", "benchmark/benchmark.py", "--standard"
])
max_memory = profiler.stop_monitoring()
print(f"Peak memory usage: {max_memory:.2f} GB")
Monitor CPU usage patterns:
import psutil
import time
import matplotlib.pyplot as plt
def monitor_cpu_usage(duration=30):
cpu_percentages = []
timestamps = []
start_time = time.time()
while time.time() - start_time < duration:
cpu_percent = psutil.cpu_percent(interval=1, percpu=True)
cpu_percentages.append(cpu_percent)
timestamps.append(time.time() - start_time)
# Plot results
plt.figure(figsize=(12, 6))
for i, core_data in enumerate(zip(*cpu_percentages)):
plt.plot(timestamps, core_data, label=f'Core {i}')
plt.xlabel('Time (seconds)')
plt.ylabel('CPU Usage (%)')
plt.title('CPU Usage During Benchmark')
plt.legend()
plt.grid(True)
plt.show()
Hardware Type | Expected Performance | Notes |
---|---|---|
High-end Desktop | 15,000-25,000 pw/s | 8+ cores, fast RAM |
Modern Laptop | 8,000-15,000 pw/s | 4-8 cores, good cooling |
Older Desktop | 3,000-8,000 pw/s | 2-4 cores, slower RAM |
Budget Laptop | 1,000-3,000 pw/s | 2-4 cores, thermal limits |
When optimizing, aim for:
Watch out for these performance issues:
❌ Low CPU utilization (<70%)
❌ Poor efficiency (<80%)
❌ Memory pressure (>90% RAM usage)
❌ Thermal throttling (performance drops over time)
Set up automated benchmarking for development:
#!/bin/bash
# run_benchmark.sh - Run after code changes
echo "Running benchmark after changes..."
uv run python benchmark/benchmark.py --standard
# Check if performance regression occurred
LATEST_RESULT=$(tail -n 1 benchmark/results/benchmark_history.csv | cut -d',' -f6)
BASELINE=5000 # Your baseline performance
if (( $(echo "$LATEST_RESULT < $BASELINE" | bc -l) )); then
echo "⚠️ Performance regression detected!"
echo "Current: $LATEST_RESULT pw/s"
echo "Baseline: $BASELINE pw/s"
exit 1
else
echo "✅ Performance maintained or improved"
echo "Current: $LATEST_RESULT pw/s"
fi
Add benchmarking to your development workflow:
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: performance-check
name: Performance benchmark
entry: ./run_benchmark.sh
language: script
pass_filenames: false
stages: [manual] # Only run when explicitly requested
Each benchmark run produces a JSON file with detailed results:
{
"timestamp": "2025-08-10T14:23:05.123456",
"pdf_file": "tests/test_pdfs/numbers/100.pdf",
"charset": "0123456789",
"min_length": 4,
"max_length": 5,
"search_space": 110000,
"processes": 8,
"batch_size": 100,
"total_passwords_checked": 110000,
"elapsed_time": 15.21,
"cpu_time": 0.15,
"passwords_per_second": 7235,
"efficiency": 98.5,
"system_info": {
"cpu_count": 8,
"memory_gb": 16,
"platform": "Linux-5.4.0-80-generic-x86_64"
},
"result_type": "PasswordNotFound",
"description": "Standard benchmark - numbers 4-5 length"
}
Historical data is maintained in CSV format for easy analysis:
timestamp,pdf_file,charset,min_length,max_length,passwords_per_second,efficiency,processes,batch_size
2025-08-10T14:23:05,tests/test_pdfs/numbers/100.pdf,0123456789,4,5,7235,98.5,8,100
2025-08-10T14:25:12,tests/test_pdfs/numbers/100.pdf,0123456789,4,5,7180,97.8,8,200
...