Analyze JSON And NDJSON Files
Categories:
Who This Page Is For
- Users analyzing JSON or NDJSON datasets from exports, APIs, logs, or search caches
- Developers validating how Indexly routes JSON through the universal loader and orchestrator
- Operators working with large newline-delimited JSON files that should be sampled safely
What Indexly Supports
Indexly can analyze these JSON shapes directly:
| Input shape | Example | Best command |
|---|---|---|
| JSON array of objects | [{"id": 1}, {"id": 2}] |
indexly analyze-json <path> |
| NDJSON / record-list JSON | one JSON object per line | indexly analyze-json <path> --chunk-size 10000 |
JSON file with .json extension but NDJSON content |
exported logs or event streams | indexly analyze-json <path> --chunk-size 10000 |
| Compressed JSON | records.json.gz |
indexly analyze-json <path> |
| Socrata-style JSON | { "columns": [...], "data": [...] } |
indexly analyze-json <path> |
| Indexly search cache | search_cache.json |
indexly analyze-file <path> --summarize-search |
You can also use the generic route:
indexly analyze-file data.json --show-summary
indexly analyze-file events.ndjson --show-summary
Use analyze-file when you want Indexly to auto-detect the file type as part of a mixed structured-data workflow.
Why Use analyze-json
The JSON-focused command is the safest starting point when the input may be large, newline-delimited, or table-shaped.
It:
- detects NDJSON from content, even when the file extension is
.json - reads only a bounded prefix for initial JSON/NDJSON detection
- uses
--chunk-sizeto limit materialized NDJSON rows - rejects malformed NDJSON lines instead of silently dropping them
- preserves mixed identifier-like string columns such as
id,code,zip, andphone - maps Socrata-style
columnsanddatablocks into a real table
For newline-delimited datasets, --chunk-size controls how many records Indexly materializes for analysis. Use analyze-json when you need this sampling control. The generic analyze-file route does not expose --chunk-size.
Recommended Workflows
1. Analyze a normal JSON dataset
indexly analyze-json .\data.json --show-summary
Use this when the file is a standard JSON object, a list of records, or an exported Indexly JSON analysis file.
2. Analyze NDJSON safely
indexly analyze-json .\events.json --chunk-size 10000 --show-summary
Use this when the file has one JSON object per line.
If a malformed line is encountered inside the sampled range, Indexly stops the load and reports the invalid line instead of analyzing a partial record set as if it were complete.
3. Analyze compressed JSON
indexly analyze-json .\records.json.gz --show-summary
Compressed JSON uses the same detection path as normal JSON.
4. Summarize Indexly search-cache JSON
indexly analyze-file .\search_cache.json --summarize-search --sortdate-by week
Use this when the JSON contains cached Indexly search results with timestamps, snippets, tags, and derived dates.
Generic Route Equivalents
These commands can also analyze JSON through the orchestrator:
indexly analyze-file .\data.json --show-summary
indexly analyze-file .\events.ndjson --show-summary
Use the generic route when:
- you are exploring mixed file types with one command style
- you want AutoDoctor JSON detection to happen automatically
- you do not need
--chunk-size
Use analyze-json when:
- the artifact is definitely JSON or NDJSON
- the input may be large
- the file has
.jsonextension but NDJSON content - you need chunk-limited NDJSON analysis
Statistics And Assumptions
Indexly summarizes numeric columns with count, nulls, mean, median, standard deviation, sum, minimum, maximum, quartiles, and IQR.
Important assumptions:
- string columns are converted to numeric only when they are overwhelmingly numeric
- identifier-like strings are preserved to avoid turning mixed IDs or codes into missing values
- sampled summaries describe the materialized sample, not the full source file
- table output includes sampling metadata when row or column limits are applied
Troubleshooting
A large .json file is actually NDJSON
Use:
indexly analyze-json .\events.json --chunk-size 10000 --show-summary
Indexly detects record-list JSON from content, not only from the file extension.
A malformed NDJSON file no longer produces partial output
That is intentional. Fix or remove the malformed line before analysis so summary statistics are based on a known record set.
A numeric-looking code stayed textual
That is usually correct for business identifiers. Columns such as id, code, key, zip, postal, and phone are protected from automatic numeric coercion when they arrive as strings.
AutoDoctor JSON needs operational meaning
Use the dedicated AutoDoctor route:
indexly analyze-autodoctor .\AutoDoctor_Report.json --show-summary
For details, see Analyze AutoDoctor Artifacts.