Indexly Developer Guide
Categories:
This guide is for contributors who want to ship reliable changes without breaking CLI stability.
Development Principles
Indexly is maintained with these priorities:
- Keep the core install lightweight and brew-friendly.
- Preserve CLI backward compatibility where possible.
- Fail gracefully when optional dependencies are missing.
- Prefer clear modules over tightly coupled logic.
- Keep changes testable and easy to review.
Local Setup
If you want the maintained contributor workstation flow, start with:
- Windows Development Environment Setup for the maintained Windows workflow
- Linux Development Environment Setup for the maintained Ubuntu/Linux workflow
This page focuses on repo-local development once your shell and workstation are ready.
Clone and create a virtual environment:
git clone https://github.com/kimsgent/project-indexly.git
cd project-indexly
python -m venv .venv
Activate:
- macOS/Linux:
source .venv/bin/activate - Windows (PowerShell):
.venv\Scripts\Activate.ps1
Install editable package:
python -m pip install --upgrade pip
python -m pip install -e .
Install optional packs for full feature development:
python -m pip install -e ".[documents,analysis,visualization,pdf_export,backup]"
Install dev tooling:
python -m pip install pytest pytest-cov flake8 black isort mypy build twine hatch
Quick verification:
python -m indexly --help
indexly --version
Windows Contributor Shortcut
On Windows, this repository also ships a repo-native setup script:
.\setup.ps1 -CheckOnly
.\setup.ps1
That script currently:
- validates
winget, Python, and expected repo files - applies system dependencies from
winget.yaml - creates or reuses
.venv - installs both
requirements.txtandrequirements-dev.txt
For platform install notes, see Install Indexly. For maintained workstation setup, see Windows Development Environment Setup and Linux Development Environment Setup.
Repository Structure
project-indexly/
├── pyproject.toml
├── README.md
├── README_PYPI.md
├── scripts/
│ └── generate_brew_formula.py
├── tests/
├── docs/
│ └── content/documentation/
└── src/indexly/
├── __main__.py
├── indexly.py
├── cli_utils.py
├── optional_deps.py
├── filetype_utils.py
├── db_utils.py
├── fts_core.py
├── backup/
├── compare/
├── inference/
├── observers/
├── organize/
├── visualization/
└── assets/
Key Modules And Responsibilities
| Area | Main modules | Purpose |
|---|---|---|
| CLI entry | __main__.py, indexly.py, cli_utils.py |
Parses commands and routes to feature handlers |
| Indexing/search | fts_core.py, search_core.py, delete_search.py, db_utils.py, db_pipeline.py |
FTS5 indexing, query execution, safe index deletion, and persistence |
| File extraction | filetype_utils.py, extract_utils.py, optional_deps.py |
File-type routing and lazy optional imports |
| Analysis | csv_analyzer.py, analysis_orchestrator.py, analyze_json.py, analyze_db.py, autodoctor_*.py, inference/ |
CSV/data profiling, structured-data analysis, AutoDoctor-aware summaries, and statistical inference |
| Organization | organize/organizer.py, organize/lister.py, organize/cli_wrapper.py |
Folder structuring, logs, lister views |
| Compare | compare/compare_engine.py, compare/file_compare.py, compare/folder_compare.py |
File/folder diff and similarity checks |
| Backup/restore | backup/cli.py, backup/restore.py, backup/compress.py |
Full/incremental backup and restore workflows |
| Monitoring | watcher.py, observers/runner.py, observers/registry.py |
Live folder watch, semantic observer runs, event callbacks, metrics, and audits |
| Health diagnostics | doctor.py, db_update.py, db_schema_utils.py |
Runtime health checks, search/analysis DB diagnostics, guarded repair flow |
Dependency Policy (Important)
Indexly is designed for lightweight core installation and optional feature packs. When adding dependencies:
- Keep core dependencies minimal, pure Python where possible.
- Put heavy/compiled libraries in extras (
documents,analysis,visualization,pdf_export). - Never import optional dependencies at module import time for core paths.
- Use lazy imports and user-friendly install hints.
Use this pattern for optional imports:
try:
import pandas as pd
except ModuleNotFoundError as exc:
raise ModuleNotFoundError(
"Feature requires optional dependency 'pandas'. "
"Feature requires: pip install indexly[analysis]"
) from exc
How Commands Are Wired
Typical command flow:
src/indexly/__main__.pystarts the CLI.src/indexly/indexly.pybuilds/dispatches command handlers.- Handler calls the feature module (for example indexing, analysis, compare).
- Output helpers print results and optional exports.
When adding a command:
- Add parser arguments in
cli_utils.pyor command parser section. - Add handler logic in
indexly.py(or a dedicated module). - Keep default behavior safe and non-destructive.
- Add tests and update relevant docs page(s).
Common Extension Points
- New file type extraction:
filetype_utils.py,extract_utils.py - Search behavior:
fts_core.py,search_core.py,delete_search.py - CSV/data features:
csv_analyzer.py,analysis_orchestrator.py,inference/ - Export and rendering:
output_utils.py,export_utils.py,visualization/ - Ignore behavior:
ignore/andignore_defaults/ - Backup behavior:
backup/
Observer Internals
The observer system lives in src/indexly/observers/.
Core modules:
base.py: observer contract (applies_to,extract,compare,format_event)registry.py: built-in registration, enable/disable controls, event handlersrunner.py: execution order, dependency wiring, snapshot lifecycle, logging, metricssnapshot_store.py: latest generic observer snapshots inobserver_snapshotscsv/csv_snapshot_store.py: historical CSV snapshots incsv_snapshotsaggregator.py: optional event aggregation for programmatic runsmetrics.py: in-process observer execution metrics
Built-in observer names are identity, field, state, health_identity, health_fields, health_events, and csv.
The public CLI surface is intentionally small:
indexly observe --help
indexly observe run /path/to/file
indexly observe run /path/to/folder --recursive
indexly observe audit
indexly observe audit --id 20260201-patient-00001
Programmatic controls such as disable_observer(), enable_observer(), register_event_handler(), run_observers_batch(), event aggregation, and MetricsCollector.get_summary() are Python APIs rather than CLI flags.
When changing observer behavior, run:
python -m pytest tests/test_observers_config.py tests/test_health_event_observer.py tests/test_csv_snapshot_store.py tests/test_csv_observer.py tests/test_observer_runner.py
python -m indexly observe --help
python -m indexly observe run --help
python -m indexly observe audit --help
Clear Search Internals
The clear-search command is implemented in src/indexly/delete_search.py and wired through cli_utils.py.
It is intentionally separate from fts_core.py because deletion has different safety requirements than indexing.
Responsibility Boundary
delete_search.py only operates on the FTS search database configured by config.DB_FILE, normally fts_index.db.
It does not delete source files and does not modify the separate cleaned-data stats database used by analysis commands.
The deletion surface is limited to:
file_index: FTS5 virtual table rowsfile_tags: tag rows for deleted pathsfile_metadata: structured metadata rows for deleted pathssearch_cache.json: cache entries referencing deleted paths
Control Flow
The high-level flow is:
- Validate that exactly one criterion is supplied: path, tag, or all.
- Resolve matching paths using normalized path, prefix, basename, or exact tag semantics.
- Build a deletion plan with per-table counts and an operation ID.
- Print the plan and, in CLI mode, request confirmation unless
--yesis set. - Log
SEARCH_DELETE_INITIATEDbefore changing the database. - Delete rows inside one SQLite transaction.
- Verify deleted counts against the plan.
- Invalidate search cache entries on a best-effort basis.
- Log completion events and print the final summary.
Safety Guarantees
Destructive behavior should stay conservative:
- Keep
--dry-runread-only. - Keep
--yesas the only way to skip confirmation in non-dry-run CLI use. - Keep path and tag deletion inside a transaction.
- Treat cache and logging failures as warnings after database success, not as rollback triggers.
- Preserve operation IDs in user output and logs for auditability.
Path And Tag Semantics
Path matching uses normalized strings from path_utils.normalize_path():
- exact path first
- directory-like prefix next
- basename fallback for legacy compatibility
Tag matching reads comma-separated values from both file_tags.tags and file_index.tag.
Multiple tags are OR logic. Do not change this to AND logic without a CLI flag and migration note because existing help and tests document OR behavior.
Testing Requirements
When changing delete_search.py, update or run:
python -m pytest tests/test_delete_search.py -q
python -m pytest tests/test_search.py tests/test_tagging.py -q
Add tests for:
- confirmation and cancellation
- dry-run read-only behavior
- cache save failures
- database lock or corruption diagnostics
- transaction rollback when a table delete fails
- large batch progress output
- path normalization edge cases
Doctor Internals
The indexly doctor command is implemented in src/indexly/doctor.py.
It is a health and maintenance orchestration layer, not a replacement for search, indexing, analysis, or migration modules.
Responsibility Boundary
Plain indexly doctor must stay read-only.
It may inspect:
- runtime paths from
config.py - search database health for
fts_index.dbor an explicit--db - analysis persistence at
~/.indexly/indexly.db search_cache.json- optional dependency availability
- external tools such as ExifTool and Tesseract
State-changing actions require explicit flags:
--clear-cachewrites{}to the search cache file--fix-dbapplies schema migrations after preflight checks and confirmation unless--auto-fixis used--rebuild-ftsallows FTS5 virtual table rebuilds during repair
--full-integrity is intentionally read-only. It enables SQLite PRAGMA integrity_check for inspected databases and should not imply repair.
Command Wiring
Doctor flags are declared in cli_utils.py and forwarded by handle_doctor() in indexly.py.
When adding or renaming a Doctor flag:
- update the parser in
cli_utils.py - forward the value in
indexly.py - include the flag in
show-help --detailsif it is high-signal - update
docs/content/documentation/indexly-doctor.md - add or update
tests/test_doctor.py
FTS5 Safety Rule
Do not silently rebuild FTS5 virtual tables.
FTS5 table definitions do not guarantee that all path values can be reconstructed safely from a damaged or legacy virtual table.
The repair layer in db_update.py therefore skips FTS5 rebuilds unless allow_fts_rebuild=True, which is exposed through:
indexly doctor --fix-db --rebuild-fts
Prefer re-indexing source folders or restoring a known-good backup when FTS data is suspect.
Testing Requirements
Run the focused Doctor suite after changes:
python -m pytest tests/test_doctor.py tests/test_search.py::test_search_cli_defaults_to_runtime_db_unless_db_is_explicit
For path deletion or cache-adjacent changes, also run:
python -m pytest tests/test_delete_search.py
On Windows, use an explicit writable --basetemp when local ACLs make default pytest temp folders unreadable.
Testing And Quality Checks
Run fast checks during development:
pytest -q
flake8 src tests
black --check src tests
isort --check-only src tests
mypy src/indexly
Smoke-test critical commands after larger changes:
indexly --help
indexly show-help
indexly doctor
If you modify indexing, analysis, compare, backup, or migration behavior, run targeted command tests in a local sandbox folder.
Build, Package, And Brew Formula
Build package artifacts:
python -m build
twine check dist/*
Generate Homebrew formula:
python scripts/generate_brew_formula.py --out Formula/indexly.rb
Dry-run formula generation with local source artifact:
python scripts/generate_brew_formula.py --dry-run --source dist/indexly-<version>.tar.gz --out Formula/indexly.rb
Brew-oriented review checklist:
- Formula uses
virtualenv_install_with_resources. - Dependency resource list stays small and stable.
- No heavy scientific stack in core runtime dependencies.
- CLI starts correctly with only core dependencies installed.
Documentation Responsibilities
When behavior changes, update docs in the same PR:
- User-facing install/usage:
README.md,README_PYPI.md - Website docs:
docs/content/documentation/ - Packaging behavior:
scripts/generate_brew_formula.pydocs and examples
Keep examples copy-paste ready and aligned with indexly --help.
When you change AutoDoctor-related analysis behavior, update both sides of the documentation boundary:
- Indexly-side operational usage:
docs/content/documentation/analyze-autodoctor-artifacts.md - AutoDoctor-side artifact meaning:
docs/content/documentation/autodoctor/
This keeps “how to analyze the artifact” separate from “what the artifact means inside AutoDoctor,” which mirrors the current code separation.
Contribution Workflow
- Create a feature branch from latest
main. - Keep commits focused and descriptive.
- Run quality checks locally.
- Include a risk note in your PR for production-sensitive changes.
- Document any compatibility impact (especially brew/package/install changes).
See Contributing for collaboration details.