Legacy .log Logging System – Full Documentation
Categories:
This part of the documentation explains how the old .log-based logging system works. Although Indexly now uses ndjson logging by default, some users may still rely on the classic workflow for exporting, cleaning, or analysing older logs.
To learn how the current logging system works,
➡️ “Logging with ndjson”
1. What Are Legacy .log Files?
Before the ndjson upgrade, Indexly saved indexing activity in daily files such as:
2024-11-03_index.log
2024-11-04_index.log
These logs contain raw indexing paths and timestamps. Unlike ndjson logs, they require cleaning and processing before being used for analysis or conversion.
2. How Indexly Parses .log Files
Indexly uses the following logic:
✓ Identifying log files
Files must match the pattern:
YYYY-MM-DD_index.log
✓ Extracting metadata
Each line is scanned for:
- timestamp
- file path
- filename and extension
- optional metadata from directory structure:
/year/month/customer/filename
Example:
2024-11-04T10:32:22Z /projects/2024/05/acme/report.docx
Extracted result:
{
"path": "projects/2024/05/acme/report.docx",
"filename": "report.docx",
"extension": "docx",
"customer": "acme",
"year": "2024",
"month": "05"
}
3. Cleaning and Normalization
Before exporting, Indexly automatically:
- normalizes slashes
- fixes duplicate separators
- cleans filenames (
spaces → dashes) - removes duplicates across logs
- extracts year/month/customer if present
- computes SHA-1 hash of each log for integrity tracking
4. Exporting Legacy Logs
Legacy logs can be exported to JSON, NDJSON, or CSV.
Single Log Example
indexly log-clean ./2024-11-03_index.log --export json
Batch / Directory Example
indexly log-clean ./logs/ --export ndjson --combine-log
Export functions:
_export_json()_export_ndjson()_export_csv()
5. Combined vs Individual Export
Individual Mode
Each .log file becomes its own cleaned output:
2024-11-03_cleaned.json
2024-11-04_cleaned.json
Combined Mode
All logs → one merged output:
index-cleaned-all.ndjson
6. Summary Output
A human-readable summary is generated:
- log dates
- number of entries
- earliest/latest timestamps
- per-customer file count
- duplicate path detection
7. Migration Note
Although .log files remain fully supported, the new ndjson logging system is recommended because:
- metadata extraction is automatic
- no cleaning step is required
- analysis is faster (stream-friendly format)
- works directly with
analyze-jsonandanalyze-file
To continue, see: ➡️ Logging with ndjson (New Standard)