From 697ed6dc842defcec2014e193b2e13c82592c701 Mon Sep 17 00:00:00 2001 From: Frank Conrads Date: Wed, 17 Jun 2026 11:34:32 +0200 Subject: [PATCH] build.bat: added upx minify for caesiumclt.exe --- AGENTS.md | 236 +++++++++++++++++++++++++++++++++++++ PACKAGING_GUIDE.md | 288 +++++++++++++++++++++++++++++++++++++++++++++ build.bat | 28 +++-- 3 files changed, 539 insertions(+), 13 deletions(-) create mode 100644 AGENTS.md create mode 100644 PACKAGING_GUIDE.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..5e65408 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,236 @@ +# AGENTS.md — pptx-image-compress + +Guidelines for AI agents and contributors working in this codebase. + +--- + +## Project Overview + +Single-file Python CLI tool (`pptx_image_compress.py`) that compresses images +inside `.pptx` files using the external binary `caesiumclt`. Supports single- +file and batch modes, multi-threaded processing, and CSV logging. + +**Entry point:** `pptx_image_compress.py` → `main()` +**Tests:** `test_pptx_image_compress.py` (stdlib `unittest`, run via `pytest`) +**External dependency:** `caesiumclt` must be on `PATH` + +--- + +## Running Tests + +```bash +python -m pytest test_pptx_image_compress.py -v +``` + +All 5 tests must pass before any change is considered complete. +Never remove or weaken an existing test. Always add a test for new behaviour. + +--- + +## Code Readability + +- **One responsibility per function.** If a function does more than one thing, + split it. +- **Descriptive names.** Avoid single-letter variables outside of short loops. + Prefer `img_path` over `p`, `result` over `r`. +- **Type-annotate every function signature** — parameters and return type. + Use `Optional[X]` / `X | None` consistently (the codebase uses both; prefer + `X | None` for new code on Python 3.10+). +- **Constants at module level**, UPPER_SNAKE_CASE. Never hardcode magic values + inline (e.g. file extensions, prefix strings, bar lengths). +- **Section comments** (`# --- Section ---`) are used to separate logical + blocks. Keep them and add new ones when introducing a new logical group. +- **German UI strings are intentional** (progress output, error messages shown + to the end-user). Keep them in German. Internal code identifiers stay in + English. +- **No dead code.** Remove commented-out blocks and unused functions before + committing. + +--- + +## Testability + +- **Inject external dependencies via callable parameters.** The `compressor` + parameter on `process_image_file` and `process_single_deck` is the canonical + pattern — always use it for any new external-process call. +- **Never call `shutil.which` or `subprocess` directly inside a function under + test.** Route through an injectable or mockable seam. +- **Tests use `tempfile.TemporaryDirectory`** for isolation. Every test must + clean up after itself — rely on the context manager, not `tearDown`. +- **Do not test private implementation details.** Test observable behaviour: + return values, file contents, log output. +- **One assertion focus per test.** A test named `test_X` should assert exactly + what `X` does, with a minimal setup. +- **Use `fake_compressor` pattern** (as seen in existing tests) to decouple + image-compression logic from the real `caesiumclt` binary in all unit tests. + +--- + +## Performance + +- **Thread pool sizing:** outer thread count is controlled by `-t/--threads` + (default 16). When `threads > 1`, each `caesiumclt` subprocess is launched + with `--threads 1` to prevent CPU over-subscription. Do not change this + without benchmarking. +- **Scratch directories are per-image** (`img_{idx:06d}` sub-dirs) to avoid + filename collisions across threads without locking. +- **`Lock` scope must be minimal.** Only counter increments and `log_lines` + appends are inside the lock — never I/O or subprocess calls. +- **Avoid redundant filesystem walks.** `build_image_slide_index` is called + once per deck, not per image. Keep it that way. +- **`zip_dir_to_pptx` collects all files before writing** so `[Content_Types].xml` + can be placed first. Do not revert this to a streaming walk. + +--- + +## Architecture + +### Current state + +Single-file design (`pptx_image_compress.py`) is intentional for zero-install +distribution. It is acceptable as long as the file stays under ~700 lines. + +### Target layout (clean architecture — migrate when the file grows) + +When the project needs to grow, extract to a package following these layers. +Dependencies must only point **inward** (CLI → Application → Domain ← +Infrastructure implements Domain ports). + +``` +pptx_compress/ +├── __init__.py +├── __main__.py # python -m pptx_compress entry point +│ +├── domain/ # innermost — zero external imports +│ ├── __init__.py +│ ├── models.py # DeckResult, ImageProcessResult (dataclasses) +│ ├── constants.py # ALLOWED_EXT, TEMP_PREFIX, defaults +│ └── ports.py # Compressor Protocol (typing.Protocol), SlideIndex ABC +│ +├── application/ # orchestration — imports domain only +│ ├── __init__.py +│ ├── compress_deck.py # process_single_deck() use-case +│ └── batch.py # batch loop, overall summary logic +│ +├── infrastructure/ # implements domain ports — imports domain + stdlib/3rd-party +│ ├── __init__.py +│ ├── caesium_adapter.py # compress_with_caesium() (caesiumclt subprocess) +│ ├── pptx_reader.py # discover_images(), build_image_slide_index() +│ ├── pptx_writer.py # zip_dir_to_pptx() +│ └── temp_manager.py # cleanup_old_temps(), TEMP_PREFIX lifecycle +│ +└── cli/ # outermost — imports application only + ├── __init__.py + ├── args.py # argparse definition, expand_inputs(), collect_from_dir() + └── output.py # print_progress(), format_duration(), human_mb/kb +``` + +### Layer rules + +| Layer | May import | Must NOT import | +|---|---|---| +| `domain` | stdlib only | everything else | +| `application` | `domain` | `infrastructure`, `cli` | +| `infrastructure` | `domain`, stdlib, 3rd-party | `application`, `cli` | +| `cli` | `application`, `domain.models` | `infrastructure` directly | + +### Key architectural decisions + +- **`Compressor` is a `typing.Protocol`** (in `domain/ports.py`), not a bare + `Callable`. This makes the contract explicit and IDE-checkable without + creating an import cycle: + ```python + class Compressor(Protocol): + def __call__( + self, + original: Path, + out_dir: Path, + threads: int | None, + quality: int, + min_savings: str, + ) -> Path | None: ... + ``` +- **`DeckResult` and `ImageProcessResult` live in `domain/models.py`** — they + are pure data, no logic, no I/O. +- **`compress_deck.py` receives a `Compressor` instance via constructor or + parameter** — never imports `caesium_adapter` directly. This is what makes + the use-case fully unit-testable with a `fake_compressor`. +- **`main()` (in `cli/args.py`) owns argument parsing only.** It resolves + paths, builds the `Compressor` adapter, and calls `application.compress_deck` + or `application.batch`. No processing logic belongs there. +- **`expand_inputs` / `collect_from_dir` live in `cli/args.py`** — path + resolution is a CLI concern. All layers below receive `Path` objects. +- **Temp directory lifecycle belongs in `infrastructure/temp_manager.py`.** + Always use `TEMP_PREFIX` so orphaned dirs from crashed runs are recoverable. + +### Migration guide (single file → package) + +1. Create the `pptx_compress/` directory. +2. Move dataclasses and constants to `domain/`. +3. Move `compress_with_caesium` → `infrastructure/caesium_adapter.py`. +4. Move PPTX read/write helpers → `infrastructure/pptx_reader.py` and + `pptx_writer.py`. +5. Move `process_image_file` + `process_single_deck` → `application/compress_deck.py`. +6. Move `main()` + input helpers → `cli/args.py`. +7. Add `__main__.py` with `from pptx_compress.cli.args import main; main()`. +8. Update `test_pptx_image_compress.py` imports accordingly — test logic does + not need to change because the public API surface is identical. + +### Refactoring plan (aligned with this AGENTS.md) + +- Keep the same layer direction: `cli` → `application` → `domain`; only + `infrastructure` implements domain ports. +- Add dedicated raster/vector implementations behind domain ports, not in CLI: + - `domain/ports.py`: `RasterCompressor`, `VectorCompressor` protocols + (or one `Compressor` protocol + typed strategies) + - `infrastructure/caesium_adapter.py`: raster implementation + - `infrastructure/svg_polish_adapter.py`: vector implementation +- Add routing in `application` (not `infrastructure`): + - `application/compress_deck.py`: `CompressorRouter` decides by extension + - no direct `subprocess` / external library calls in `application` +- Split image workflow into explicit application steps: + - `compress_step` + - `optimal_format_step` (PNG → JPEG optimization step; not a fallback) + - `replace_step` (atomic replace via `.tmp` + `Path.replace()`) +- Centralize PPTX metadata handling in infrastructure modules: + - keep relationship/content-type updates in `infrastructure/pptx_reader.py` + and/or `infrastructure/pptx_writer.py` + - `application` only orchestrates and passes domain models +- Introduce configuration object in `domain/constants.py` or a dedicated + domain config model; avoid new magic values in `application`. +- Preserve public behaviour and CLI surface during migration; refactor in + small commits with green tests after each step. + +### Suggested commit sequence + +1. Extract domain models/constants/ports unchanged. +2. Extract caesium adapter + add svg_polish adapter seam. +3. Introduce router in `application` with extension-based dispatch. +4. Refactor image processing into `compress_step` + `optimal_format_step` + + `replace_step`. +5. Extract PPTX metadata update helpers to infrastructure modules. +6. Move CLI parsing/output concerns into `cli/` only. +7. Remove dead monolith code paths and keep tests passing. + +--- + +## Security + +- **Never pass unsanitised user input directly to `subprocess`.** The + `compress_with_caesium` function builds the command as a list (not a shell + string). Keep it that way — do not use `shell=True`. +- **Validate file extensions before compression.** `compress_with_caesium` + checks `ext not in ALLOWED_EXT` and returns `None` for unrecognised types. + Do not bypass or widen this check without explicit justification. +- **Validate input paths early.** `process_single_deck` checks that the input + exists and has a `.pptx` suffix before doing any filesystem work. +- **Temp files are written atomically.** Image replacement uses a `.tmp` + intermediate and `Path.replace()` (atomic rename) — do not change this to a + direct overwrite. +- **`capture_output=True`** is set on all subprocess calls so that stdout/stderr + from `caesiumclt` cannot interfere with or inject into the tool's own output. +- **Do not log file contents**, only metadata (name, size, slide references). + The CSV log must never contain image binary data or path information outside + the output directory. +- **`ignore_errors=True` on `shutil.rmtree`** is acceptable for temp cleanup + only. Never suppress errors on writes to the output PPTX or its log file. diff --git a/PACKAGING_GUIDE.md b/PACKAGING_GUIDE.md new file mode 100644 index 0000000..35428af --- /dev/null +++ b/PACKAGING_GUIDE.md @@ -0,0 +1,288 @@ +# PPTX Image Compressor - Packaging & Distribution Guide + +## Overview + +This project now supports 3 different deployment approaches: +1. **Embedded Python** (Current - Development & Local Use) +2. **PyInstaller Portable** (Recommended for End Users) +3. **Hybrid Approach** (Recommended for Maximum Flexibility) + +--- + +## Approach 1: Embedded Python (Development Setup) + +### Use Case +- ✅ Development with VS Code +- ✅ Debugging with Python Debugger +- ✅ Running tests with pytest +- ✅ Source code control & modifications + +### What You Have +- `install_and_run.bat` - Main launcher +- `.venv/` - Virtual environment for development +- Source code - Fully editable + +### Usage +```bash +# Development (with --debug flag to see pip output) +.\install_and_run.bat --debug -i "path\to\file.pptx" -o "path\to\output.pptx" + +# With VS Code debugger +# Open pptx_image_compress.py and click "Debug" or press F5 + +# Run tests +.\.venv\Scripts\pytest tests/ +``` + +### Setup Instructions +1. Ensure Python 3.9+ is installed on Windows +2. Run: `.\install_and_run.bat` +3. Virtual environment is created/updated automatically + +### Files +- `install_and_run.bat` - Handles Python setup and execution +- `requirements.txt` - Python package dependencies for runtime +- `requirements-dev.txt` - Development dependencies (pytest, coverage, etc.) + +--- + +## Approach 2: PyInstaller Portable (User Distribution) + +### Use Case +- ✅ Distribute to end users (No Python installation needed) +- ✅ Single-file executable +- ✅ Professional appearance +- ✅ "Just download and run" experience +- ✅ Drag-and-drop support for non-technical users + +### What Gets Generated +- `dist/pptx-image-compress.exe` - Standalone executable +- `dist/run.bat` - Simple command-line wrapper +- `dist/dragdrop.bat` - Drag-and-drop wrapper (easiest for users) + +### Usage by End User +```bash +# Option 1: Drag-and-drop (Easiest!) +# Drag a .pptx file onto dragdrop.bat +# Output: filename_compressed.pptx + +# Option 2: Command line +.\pptx-image-compress.exe -i "path\to\file.pptx" -o "path\to\output.pptx" + +# Option 3: Wrapper +.\run.bat -i "path\to\file.pptx" -o "path\to\output.pptx" + +# Get help +.\pptx-image-compress.exe --help +``` + +### Build Instructions +1. Ensure you have the development environment set up (Approach 1) +2. Run: `.\build.bat` +3. Wait for build to complete (first run takes 2-3 minutes) +4. Generated files are in `dist/` folder + +### Build Files +- `build.bat` - Automated build script that: + - Installs PyInstaller if needed + - Compiles Python to standalone executable + - Creates wrapper batch file + - Prepares distribution package + +### Advantages +- No Python installation required on user's machine +- Smaller footprint than full Python installation +- Professional distribution option +- Can be code-signed and digitally stamped + +### Limitations +- Larger file size (~80-150 MB) due to bundled Python +- First launch slightly slower (unpacking) +- Harder to debug if issues occur + +### Distribution Notes +- Ensure `caesiumclt.exe` is in the same directory as the .exe +- Can optionally add `.venv\Lib\site-packages\svg_polish\*` if svg-polish needs updating +- All dependencies are pre-bundled +- Users can: + 1. **Drag-and-drop files** onto `dragdrop.bat` for easy compression + 2. Use command line for batch operations + 3. Call `.exe` directly with custom parameters + +### Drag-and-Drop Feature +The `dragdrop.bat` wrapper provides the easiest user experience: +- Drag a `.pptx` file onto `dragdrop.bat` +- Automatically creates `[filename]_compressed.pptx` in the same directory +- Shows progress and completion status +- No command-line knowledge required + +--- + +## Approach 3: Hybrid (Recommended) + +### Use Case +- ✅ Flexible development workflow +- ✅ Easy distribution to users +- ✅ Best of both worlds + +### How It Works +**For Developers:** +- Use Embedded Python approach (Approach 1) +- Edit code, debug, run tests +- Keep development lightweight + +**For Users:** +- Use PyInstaller Portable (Approach 2) +- Download and run .exe +- No installation or configuration needed + +### Workflow +``` +Development Phase: +├── Edit code +├── Test with: .\.venv\Scripts\pytest +├── Debug with VS Code +└── Use: .\install_and_run.bat --debug -i file.pptx + +Release Phase: +├── Run: .\build.bat +├── Test the .exe: .\dist\pptx-image-compress.exe -i file.pptx +├── Package: Copy dist/* to users +└── Users just run: pptx-image-compress.exe +``` + +--- + +## Dependency Management + +### Runtime Dependencies (required for execution) +See `requirements.txt`: +- `svg-polish==1.0.0` - SVG optimization library (brings defusedxml) + +The core script uses only Python Standard Library modules: +- No need for external image libraries +- Uses system's `caesiumclt.exe` for image compression +- Handles PPTX files using only built-in zipfile module + +### Development Dependencies (for testing/development) +See `requirements-dev.txt`: +- `pytest==9.0.3` - Testing framework +- `pytest-cov==7.1.0` - Coverage reporting +- Plus all runtime dependencies + +### Managing Dependencies + +**Update packages:** +```bash +.\.venv\Scripts\pip install --upgrade -r requirements.txt +``` + +**Add new package:** +```bash +.\.venv\Scripts\pip install package_name +.\.venv\Scripts\pip freeze > requirements.txt +``` + +**For development:** +```bash +.\.venv\Scripts\pip install -r requirements-dev.txt +``` + +--- + +## Troubleshooting + +### Development Setup Issues + +**Problem:** "Python not found" +- **Solution:** Run `.\install_and_run.bat` which will download and setup Python + +**Problem:** "svg_polish not found" +- **Solution:** Run with `--debug` flag to see installation details +- Or manually: `.\.venv\Scripts\pip install svg-polish` + +### PyInstaller Build Issues + +**Problem:** Build takes too long +- **Solution:** First build is slower due to PyInstaller analysis. Subsequent builds are faster. + +**Problem:** .exe won't run +- **Solution:** Ensure `caesiumclt.exe` is in the same directory or system PATH + +**Problem:** "PyInstaller not installed" +- **Solution:** Run `build.bat` again - it will auto-install PyInstaller + +--- + +## Technical Comparison + +| Aspect | Embedded Python | PyInstaller | Hybrid | +|--------|-----------------|-------------|--------| +| **Installation** | Auto (via batch) | None (single .exe) | Mixed | +| **Disk Space** | ~50 MB | ~100-150 MB | Both available | +| **Execution Speed** | Fast | Fast | Fast | +| **Debuggability** | Excellent | Difficult | Excellent (dev) | +| **Distribution** | Manual setup | Just .exe | Just .exe (users) | +| **Development** | Quick iteration | Requires rebuild | Quick iteration | + +--- + +## Recommended Workflow + +### For You (Developer) +``` +1. Daily Development: + - Use: .\install_and_run.bat --debug + - Edit code in VS Code + - Test with pytest + - Use debugger with F5 + +2. Before Release: + - Run: .\build.bat + - Test: .\dist\pptx-image-compress.exe + - Package and distribute +``` + +### For End Users +``` +1. First Time: + - Download pptx-image-compress.exe + - Download run.bat (optional) + - Download caesiumclt.exe (if image compression needed) + +2. Usage: + - Double-click run.bat + - Or: pptx-image-compress.exe -i input.pptx -o output.pptx +``` + +--- + +## Next Steps + +1. **Test Embedded Python Setup:** + ```bash + .\install_and_run.bat --debug -i test.pptx -o test_output.pptx + ``` + +2. **Build Portable Distribution:** + ```bash + .\build.bat + ``` + +3. **Test the Executable:** + ```bash + .\dist\pptx-image-compress.exe -h + ``` + +4. **Check Generated Files:** + - `dist/pptx-image-compress.exe` - Main executable + - `dist/run.bat` - Batch wrapper + +--- + +## Support & Maintenance + +- Keep Python version updated for security +- Update dependencies: `pip install --upgrade -r requirements.txt` +- Rebuild .exe when updating dependencies: `.\build.bat` +- Test both approaches before major updates diff --git a/build.bat b/build.bat index f5ac90b..da5f1da 100644 --- a/build.bat +++ b/build.bat @@ -85,19 +85,6 @@ if errorlevel 1 ( exit /b 1 ) -rem ========================= -rem Optional UPX -rem ========================= - -if defined MINIFY ( - if exist "%UPX_DIR%\upx.exe" ( - echo [INFO] Running UPX compression... - "%UPX_DIR%\upx.exe" --best --force "%BUILD_DIR%\pptx-image-compress.exe" - ) else ( - echo [WARN] UPX not found at %UPX_DIR% - ) -) - rem ========================= rem Copy templates rem ========================= @@ -111,6 +98,21 @@ rem Copy caesiumclt.exe rem ========================= copy "%SELF_DIR%bin\caesiumclt.exe" "%BUILD_DIR%\caesiumclt.exe" + +rem ========================= +rem Optional UPX +rem ========================= + +if defined MINIFY ( + if exist "%UPX_DIR%\upx.exe" ( + echo [INFO] Running UPX compression... + "%UPX_DIR%\upx.exe" --best --force "%BUILD_DIR%\pptx-image-compress.exe" + "%UPX_DIR%\upx.exe" --best --force "%BUILD_DIR%\caesiumclt.exe" + ) else ( + echo [WARN] UPX not found at %UPX_DIR% + ) +) + rem ========================= rem Done rem =========================