Indexly Logging System – NDJSON Standard and Legacy .log Support
Categories:
(Index & Watch Logging – NDJSON Standard + Legacy .log Support)
Indexly now ships with a modern, structured, analysis-ready logging engine based on NDJSON. This document explains:
- How NDJSON logging works
- How legacy
.logworks & how to use its CLI utilities - How both systems relate
- A workflow diagram showing the full log pipeline
Both documentation parts are provided separately for clarity.
Part 1 — NDJSON Logging (Standard Logging System)
NDJSON is the default and recommended logging format in Indexly. Every logged event is written as a structured JSON object, one per line.
Key benefits
- Ready for analysis using
analyze-jsonandanalyze-file.See Data Analysis Pipeline - Clean metadata extraction (year, month, customer)
- Supports compression for large fields
- Async logging engine with batching and retention
- Rotates automatically based on size or date partitioning
- Fully compatible with downstream processing tools (Python, jq, Splunk, BigQuery)
How NDJSON Logging Works
The NDJSON log system is handled by a dedicated component:
LogManager (indexly/log_utils.py)
It manages:
- Log queue
- Async worker
- Partitioned filenames
- Rotation & retention
- Compression
- Clean shutdown
Indexly uses it automatically inside:
async def scan_and_index_files()def handle_index()- Watch mode (
watcher.py)
You don’t need to configure anything unless you want custom behavior.
NDJSON Log Structure
Every log line is a JSON dict similar to:
{
"timestamp": "2025-12-08 12:15:32",
"event": "indexed",
"path": "documents/2024/11/ClientA/invoice_202411.pdf",
"filename": "invoice_202411.pdf",
"extension": "pdf",
"customer": "ClientA",
"year": "2024",
"month": "11"
}
Automatic Metadata Extraction
Metadata is extracted from:
- Folder structure Pattern:
path/to/document/<year>/<month>/<customer>/<file>
- Filename detection
Detects patterns:
- YYYY
- YYYYMM
- YYYY-MM
- YYYYMMDD
- Filesystem fallback Last modified timestamp → year/month
NDJSON Log Workflow Diagram
┌──────────────────────────┐
│ scan_and_index_files │
└──────────────┬───────────┘
▼
┌────────────────────────┐
│ _unified_log_entry │
│ (metadata extraction) │
└──────────────┬────────┘
▼
┌──────────────────────┐
│ LogManager.log() │
└──────────┬───────────┘
▼
┌──────────────────────────────────────┐
│ Async Queue → Batch → NDJSON Writer │
└──────────────────┬───────────────────┘
▼
┌────────────────────────────┐
│ Rotated NDJSON log files │
└────────────────────────────┘
Example NDJSON Log Files
Saved under:
/log/current_year/current_month/indexly-YYYY-MM-DD_index_events.ndjson
For example:
/log/current_year/current_month/2025-12-08_index_events.ndjson
/log/current_year/current_month/2025-12-08_index_events_1.ndjson
/log/current_year/current_month/2025-12-09_index_events.ndjson
Analyzing NDJSON Logs
NDJSON is fully compatible with:
indexly analyze-file file.ndjson
Both commands accept:
- filtering
- metadata grouping
- date-range analysis
- statistics
Part 2 — Legacy .log System (Old System – Still Supported)
While NDJSON is the active standard, Indexly still supports the old .log format, mainly for users who do:
- Historical log migration
- CSV/JSON conversions
- Combining multiple log files
- Cleaning file names to regenerate metadata
You can continue using .log if you:
🔹 Want to keep old behavior
AND
🔹 Restore these two old functions:
async def scan_and_index_files()(legacy version)def handle_index()(legacy version)
Once restored, the system automatically detects .log mode via log_utils.py and config.py.
Features of the Legacy .log System
- Logs are plain text
- No structure → cleaning required
- Must be processed before converting
- Metadata not embedded → extracted by tools
- CLI utilities available
Legacy Log CLI Utilities
Convert a single .log file
indexly log-clean file.log --to csv
indexly log-clean file.log --to json
indexly log-clean file.log --to ndjson
Combine multiple .log files
indexly log-clean --combine-log *.log --to ndjson
Clean & process metadata
The cleaner extracts:
- year
- month
- customer
- extension
- normalized path
Transition from Legacy to NDJSON
Legacy .log documentation includes a link directing users here, so they can switch to NDJSON seamlessly.
Similarly, this NDJSON documentation includes a reference back to the legacy system.
Part 3 — Summary
| Feature | NDJSON (Standard) | Legacy .log |
|---|---|---|
| Structured | ✔ | ✘ |
| Metadata extracted | ✔ Auto | ✔ After cleaning |
| Async logging | ✔ | ✘ |
| Rotating | ✔ | Limited |
| Analysis compatible | ✔ analyze-file |
After conversion (analyze-json, analyze-file) |
| Recommended | ✔ | Only for old workflows |
To continue, see: ➡️ Legacy Logging (Legacy Standard)