build.bat: added upx minify for caesiumclt.exe

This commit is contained in:
2026-06-17 11:34:32 +02:00
parent bb1cf98aba
commit 697ed6dc84
3 changed files with 539 additions and 13 deletions
+236
View File
@@ -0,0 +1,236 @@
# AGENTS.md — pptx-image-compress
Guidelines for AI agents and contributors working in this codebase.
---
## Project Overview
Single-file Python CLI tool (`pptx_image_compress.py`) that compresses images
inside `.pptx` files using the external binary `caesiumclt`. Supports single-
file and batch modes, multi-threaded processing, and CSV logging.
**Entry point:** `pptx_image_compress.py``main()`
**Tests:** `test_pptx_image_compress.py` (stdlib `unittest`, run via `pytest`)
**External dependency:** `caesiumclt` must be on `PATH`
---
## Running Tests
```bash
python -m pytest test_pptx_image_compress.py -v
```
All 5 tests must pass before any change is considered complete.
Never remove or weaken an existing test. Always add a test for new behaviour.
---
## Code Readability
- **One responsibility per function.** If a function does more than one thing,
split it.
- **Descriptive names.** Avoid single-letter variables outside of short loops.
Prefer `img_path` over `p`, `result` over `r`.
- **Type-annotate every function signature** — parameters and return type.
Use `Optional[X]` / `X | None` consistently (the codebase uses both; prefer
`X | None` for new code on Python 3.10+).
- **Constants at module level**, UPPER_SNAKE_CASE. Never hardcode magic values
inline (e.g. file extensions, prefix strings, bar lengths).
- **Section comments** (`# --- Section ---`) are used to separate logical
blocks. Keep them and add new ones when introducing a new logical group.
- **German UI strings are intentional** (progress output, error messages shown
to the end-user). Keep them in German. Internal code identifiers stay in
English.
- **No dead code.** Remove commented-out blocks and unused functions before
committing.
---
## Testability
- **Inject external dependencies via callable parameters.** The `compressor`
parameter on `process_image_file` and `process_single_deck` is the canonical
pattern — always use it for any new external-process call.
- **Never call `shutil.which` or `subprocess` directly inside a function under
test.** Route through an injectable or mockable seam.
- **Tests use `tempfile.TemporaryDirectory`** for isolation. Every test must
clean up after itself — rely on the context manager, not `tearDown`.
- **Do not test private implementation details.** Test observable behaviour:
return values, file contents, log output.
- **One assertion focus per test.** A test named `test_X` should assert exactly
what `X` does, with a minimal setup.
- **Use `fake_compressor` pattern** (as seen in existing tests) to decouple
image-compression logic from the real `caesiumclt` binary in all unit tests.
---
## Performance
- **Thread pool sizing:** outer thread count is controlled by `-t/--threads`
(default 16). When `threads > 1`, each `caesiumclt` subprocess is launched
with `--threads 1` to prevent CPU over-subscription. Do not change this
without benchmarking.
- **Scratch directories are per-image** (`img_{idx:06d}` sub-dirs) to avoid
filename collisions across threads without locking.
- **`Lock` scope must be minimal.** Only counter increments and `log_lines`
appends are inside the lock — never I/O or subprocess calls.
- **Avoid redundant filesystem walks.** `build_image_slide_index` is called
once per deck, not per image. Keep it that way.
- **`zip_dir_to_pptx` collects all files before writing** so `[Content_Types].xml`
can be placed first. Do not revert this to a streaming walk.
---
## Architecture
### Current state
Single-file design (`pptx_image_compress.py`) is intentional for zero-install
distribution. It is acceptable as long as the file stays under ~700 lines.
### Target layout (clean architecture — migrate when the file grows)
When the project needs to grow, extract to a package following these layers.
Dependencies must only point **inward** (CLI → Application → Domain ←
Infrastructure implements Domain ports).
```
pptx_compress/
├── __init__.py
├── __main__.py # python -m pptx_compress entry point
├── domain/ # innermost — zero external imports
│ ├── __init__.py
│ ├── models.py # DeckResult, ImageProcessResult (dataclasses)
│ ├── constants.py # ALLOWED_EXT, TEMP_PREFIX, defaults
│ └── ports.py # Compressor Protocol (typing.Protocol), SlideIndex ABC
├── application/ # orchestration — imports domain only
│ ├── __init__.py
│ ├── compress_deck.py # process_single_deck() use-case
│ └── batch.py # batch loop, overall summary logic
├── infrastructure/ # implements domain ports — imports domain + stdlib/3rd-party
│ ├── __init__.py
│ ├── caesium_adapter.py # compress_with_caesium() (caesiumclt subprocess)
│ ├── pptx_reader.py # discover_images(), build_image_slide_index()
│ ├── pptx_writer.py # zip_dir_to_pptx()
│ └── temp_manager.py # cleanup_old_temps(), TEMP_PREFIX lifecycle
└── cli/ # outermost — imports application only
├── __init__.py
├── args.py # argparse definition, expand_inputs(), collect_from_dir()
└── output.py # print_progress(), format_duration(), human_mb/kb
```
### Layer rules
| Layer | May import | Must NOT import |
|---|---|---|
| `domain` | stdlib only | everything else |
| `application` | `domain` | `infrastructure`, `cli` |
| `infrastructure` | `domain`, stdlib, 3rd-party | `application`, `cli` |
| `cli` | `application`, `domain.models` | `infrastructure` directly |
### Key architectural decisions
- **`Compressor` is a `typing.Protocol`** (in `domain/ports.py`), not a bare
`Callable`. This makes the contract explicit and IDE-checkable without
creating an import cycle:
```python
class Compressor(Protocol):
def __call__(
self,
original: Path,
out_dir: Path,
threads: int | None,
quality: int,
min_savings: str,
) -> Path | None: ...
```
- **`DeckResult` and `ImageProcessResult` live in `domain/models.py`** — they
are pure data, no logic, no I/O.
- **`compress_deck.py` receives a `Compressor` instance via constructor or
parameter** — never imports `caesium_adapter` directly. This is what makes
the use-case fully unit-testable with a `fake_compressor`.
- **`main()` (in `cli/args.py`) owns argument parsing only.** It resolves
paths, builds the `Compressor` adapter, and calls `application.compress_deck`
or `application.batch`. No processing logic belongs there.
- **`expand_inputs` / `collect_from_dir` live in `cli/args.py`** — path
resolution is a CLI concern. All layers below receive `Path` objects.
- **Temp directory lifecycle belongs in `infrastructure/temp_manager.py`.**
Always use `TEMP_PREFIX` so orphaned dirs from crashed runs are recoverable.
### Migration guide (single file → package)
1. Create the `pptx_compress/` directory.
2. Move dataclasses and constants to `domain/`.
3. Move `compress_with_caesium` → `infrastructure/caesium_adapter.py`.
4. Move PPTX read/write helpers → `infrastructure/pptx_reader.py` and
`pptx_writer.py`.
5. Move `process_image_file` + `process_single_deck` → `application/compress_deck.py`.
6. Move `main()` + input helpers → `cli/args.py`.
7. Add `__main__.py` with `from pptx_compress.cli.args import main; main()`.
8. Update `test_pptx_image_compress.py` imports accordingly — test logic does
not need to change because the public API surface is identical.
### Refactoring plan (aligned with this AGENTS.md)
- Keep the same layer direction: `cli` → `application` → `domain`; only
`infrastructure` implements domain ports.
- Add dedicated raster/vector implementations behind domain ports, not in CLI:
- `domain/ports.py`: `RasterCompressor`, `VectorCompressor` protocols
(or one `Compressor` protocol + typed strategies)
- `infrastructure/caesium_adapter.py`: raster implementation
- `infrastructure/svg_polish_adapter.py`: vector implementation
- Add routing in `application` (not `infrastructure`):
- `application/compress_deck.py`: `CompressorRouter` decides by extension
- no direct `subprocess` / external library calls in `application`
- Split image workflow into explicit application steps:
- `compress_step`
- `optimal_format_step` (PNG → JPEG optimization step; not a fallback)
- `replace_step` (atomic replace via `.tmp` + `Path.replace()`)
- Centralize PPTX metadata handling in infrastructure modules:
- keep relationship/content-type updates in `infrastructure/pptx_reader.py`
and/or `infrastructure/pptx_writer.py`
- `application` only orchestrates and passes domain models
- Introduce configuration object in `domain/constants.py` or a dedicated
domain config model; avoid new magic values in `application`.
- Preserve public behaviour and CLI surface during migration; refactor in
small commits with green tests after each step.
### Suggested commit sequence
1. Extract domain models/constants/ports unchanged.
2. Extract caesium adapter + add svg_polish adapter seam.
3. Introduce router in `application` with extension-based dispatch.
4. Refactor image processing into `compress_step` + `optimal_format_step` +
`replace_step`.
5. Extract PPTX metadata update helpers to infrastructure modules.
6. Move CLI parsing/output concerns into `cli/` only.
7. Remove dead monolith code paths and keep tests passing.
---
## Security
- **Never pass unsanitised user input directly to `subprocess`.** The
`compress_with_caesium` function builds the command as a list (not a shell
string). Keep it that way — do not use `shell=True`.
- **Validate file extensions before compression.** `compress_with_caesium`
checks `ext not in ALLOWED_EXT` and returns `None` for unrecognised types.
Do not bypass or widen this check without explicit justification.
- **Validate input paths early.** `process_single_deck` checks that the input
exists and has a `.pptx` suffix before doing any filesystem work.
- **Temp files are written atomically.** Image replacement uses a `.tmp`
intermediate and `Path.replace()` (atomic rename) — do not change this to a
direct overwrite.
- **`capture_output=True`** is set on all subprocess calls so that stdout/stderr
from `caesiumclt` cannot interfere with or inject into the tool's own output.
- **Do not log file contents**, only metadata (name, size, slide references).
The CSV log must never contain image binary data or path information outside
the output directory.
- **`ignore_errors=True` on `shutil.rmtree`** is acceptable for temp cleanup
only. Never suppress errors on writes to the output PPTX or its log file.
+288
View File
@@ -0,0 +1,288 @@
# PPTX Image Compressor - Packaging & Distribution Guide
## Overview
This project now supports 3 different deployment approaches:
1. **Embedded Python** (Current - Development & Local Use)
2. **PyInstaller Portable** (Recommended for End Users)
3. **Hybrid Approach** (Recommended for Maximum Flexibility)
---
## Approach 1: Embedded Python (Development Setup)
### Use Case
- ✅ Development with VS Code
- ✅ Debugging with Python Debugger
- ✅ Running tests with pytest
- ✅ Source code control & modifications
### What You Have
- `install_and_run.bat` - Main launcher
- `.venv/` - Virtual environment for development
- Source code - Fully editable
### Usage
```bash
# Development (with --debug flag to see pip output)
.\install_and_run.bat --debug -i "path\to\file.pptx" -o "path\to\output.pptx"
# With VS Code debugger
# Open pptx_image_compress.py and click "Debug" or press F5
# Run tests
.\.venv\Scripts\pytest tests/
```
### Setup Instructions
1. Ensure Python 3.9+ is installed on Windows
2. Run: `.\install_and_run.bat`
3. Virtual environment is created/updated automatically
### Files
- `install_and_run.bat` - Handles Python setup and execution
- `requirements.txt` - Python package dependencies for runtime
- `requirements-dev.txt` - Development dependencies (pytest, coverage, etc.)
---
## Approach 2: PyInstaller Portable (User Distribution)
### Use Case
- ✅ Distribute to end users (No Python installation needed)
- ✅ Single-file executable
- ✅ Professional appearance
- ✅ "Just download and run" experience
- ✅ Drag-and-drop support for non-technical users
### What Gets Generated
- `dist/pptx-image-compress.exe` - Standalone executable
- `dist/run.bat` - Simple command-line wrapper
- `dist/dragdrop.bat` - Drag-and-drop wrapper (easiest for users)
### Usage by End User
```bash
# Option 1: Drag-and-drop (Easiest!)
# Drag a .pptx file onto dragdrop.bat
# Output: filename_compressed.pptx
# Option 2: Command line
.\pptx-image-compress.exe -i "path\to\file.pptx" -o "path\to\output.pptx"
# Option 3: Wrapper
.\run.bat -i "path\to\file.pptx" -o "path\to\output.pptx"
# Get help
.\pptx-image-compress.exe --help
```
### Build Instructions
1. Ensure you have the development environment set up (Approach 1)
2. Run: `.\build.bat`
3. Wait for build to complete (first run takes 2-3 minutes)
4. Generated files are in `dist/` folder
### Build Files
- `build.bat` - Automated build script that:
- Installs PyInstaller if needed
- Compiles Python to standalone executable
- Creates wrapper batch file
- Prepares distribution package
### Advantages
- No Python installation required on user's machine
- Smaller footprint than full Python installation
- Professional distribution option
- Can be code-signed and digitally stamped
### Limitations
- Larger file size (~80-150 MB) due to bundled Python
- First launch slightly slower (unpacking)
- Harder to debug if issues occur
### Distribution Notes
- Ensure `caesiumclt.exe` is in the same directory as the .exe
- Can optionally add `.venv\Lib\site-packages\svg_polish\*` if svg-polish needs updating
- All dependencies are pre-bundled
- Users can:
1. **Drag-and-drop files** onto `dragdrop.bat` for easy compression
2. Use command line for batch operations
3. Call `.exe` directly with custom parameters
### Drag-and-Drop Feature
The `dragdrop.bat` wrapper provides the easiest user experience:
- Drag a `.pptx` file onto `dragdrop.bat`
- Automatically creates `[filename]_compressed.pptx` in the same directory
- Shows progress and completion status
- No command-line knowledge required
---
## Approach 3: Hybrid (Recommended)
### Use Case
- ✅ Flexible development workflow
- ✅ Easy distribution to users
- ✅ Best of both worlds
### How It Works
**For Developers:**
- Use Embedded Python approach (Approach 1)
- Edit code, debug, run tests
- Keep development lightweight
**For Users:**
- Use PyInstaller Portable (Approach 2)
- Download and run .exe
- No installation or configuration needed
### Workflow
```
Development Phase:
├── Edit code
├── Test with: .\.venv\Scripts\pytest
├── Debug with VS Code
└── Use: .\install_and_run.bat --debug -i file.pptx
Release Phase:
├── Run: .\build.bat
├── Test the .exe: .\dist\pptx-image-compress.exe -i file.pptx
├── Package: Copy dist/* to users
└── Users just run: pptx-image-compress.exe
```
---
## Dependency Management
### Runtime Dependencies (required for execution)
See `requirements.txt`:
- `svg-polish==1.0.0` - SVG optimization library (brings defusedxml)
The core script uses only Python Standard Library modules:
- No need for external image libraries
- Uses system's `caesiumclt.exe` for image compression
- Handles PPTX files using only built-in zipfile module
### Development Dependencies (for testing/development)
See `requirements-dev.txt`:
- `pytest==9.0.3` - Testing framework
- `pytest-cov==7.1.0` - Coverage reporting
- Plus all runtime dependencies
### Managing Dependencies
**Update packages:**
```bash
.\.venv\Scripts\pip install --upgrade -r requirements.txt
```
**Add new package:**
```bash
.\.venv\Scripts\pip install package_name
.\.venv\Scripts\pip freeze > requirements.txt
```
**For development:**
```bash
.\.venv\Scripts\pip install -r requirements-dev.txt
```
---
## Troubleshooting
### Development Setup Issues
**Problem:** "Python not found"
- **Solution:** Run `.\install_and_run.bat` which will download and setup Python
**Problem:** "svg_polish not found"
- **Solution:** Run with `--debug` flag to see installation details
- Or manually: `.\.venv\Scripts\pip install svg-polish`
### PyInstaller Build Issues
**Problem:** Build takes too long
- **Solution:** First build is slower due to PyInstaller analysis. Subsequent builds are faster.
**Problem:** .exe won't run
- **Solution:** Ensure `caesiumclt.exe` is in the same directory or system PATH
**Problem:** "PyInstaller not installed"
- **Solution:** Run `build.bat` again - it will auto-install PyInstaller
---
## Technical Comparison
| Aspect | Embedded Python | PyInstaller | Hybrid |
|--------|-----------------|-------------|--------|
| **Installation** | Auto (via batch) | None (single .exe) | Mixed |
| **Disk Space** | ~50 MB | ~100-150 MB | Both available |
| **Execution Speed** | Fast | Fast | Fast |
| **Debuggability** | Excellent | Difficult | Excellent (dev) |
| **Distribution** | Manual setup | Just .exe | Just .exe (users) |
| **Development** | Quick iteration | Requires rebuild | Quick iteration |
---
## Recommended Workflow
### For You (Developer)
```
1. Daily Development:
- Use: .\install_and_run.bat --debug
- Edit code in VS Code
- Test with pytest
- Use debugger with F5
2. Before Release:
- Run: .\build.bat
- Test: .\dist\pptx-image-compress.exe
- Package and distribute
```
### For End Users
```
1. First Time:
- Download pptx-image-compress.exe
- Download run.bat (optional)
- Download caesiumclt.exe (if image compression needed)
2. Usage:
- Double-click run.bat
- Or: pptx-image-compress.exe -i input.pptx -o output.pptx
```
---
## Next Steps
1. **Test Embedded Python Setup:**
```bash
.\install_and_run.bat --debug -i test.pptx -o test_output.pptx
```
2. **Build Portable Distribution:**
```bash
.\build.bat
```
3. **Test the Executable:**
```bash
.\dist\pptx-image-compress.exe -h
```
4. **Check Generated Files:**
- `dist/pptx-image-compress.exe` - Main executable
- `dist/run.bat` - Batch wrapper
---
## Support & Maintenance
- Keep Python version updated for security
- Update dependencies: `pip install --upgrade -r requirements.txt`
- Rebuild .exe when updating dependencies: `.\build.bat`
- Test both approaches before major updates
+15 -13
View File
@@ -85,19 +85,6 @@ if errorlevel 1 (
exit /b 1 exit /b 1
) )
rem =========================
rem Optional UPX
rem =========================
if defined MINIFY (
if exist "%UPX_DIR%\upx.exe" (
echo [INFO] Running UPX compression...
"%UPX_DIR%\upx.exe" --best --force "%BUILD_DIR%\pptx-image-compress.exe"
) else (
echo [WARN] UPX not found at %UPX_DIR%
)
)
rem ========================= rem =========================
rem Copy templates rem Copy templates
rem ========================= rem =========================
@@ -111,6 +98,21 @@ rem Copy caesiumclt.exe
rem ========================= rem =========================
copy "%SELF_DIR%bin\caesiumclt.exe" "%BUILD_DIR%\caesiumclt.exe" copy "%SELF_DIR%bin\caesiumclt.exe" "%BUILD_DIR%\caesiumclt.exe"
rem =========================
rem Optional UPX
rem =========================
if defined MINIFY (
if exist "%UPX_DIR%\upx.exe" (
echo [INFO] Running UPX compression...
"%UPX_DIR%\upx.exe" --best --force "%BUILD_DIR%\pptx-image-compress.exe"
"%UPX_DIR%\upx.exe" --best --force "%BUILD_DIR%\caesiumclt.exe"
) else (
echo [WARN] UPX not found at %UPX_DIR%
)
)
rem ========================= rem =========================
rem Done rem Done
rem ========================= rem =========================