Analyze CSV Data
Categories:
Who This Page Is For
- Users who want quick CSV statistics from the command line
- Analysts preparing CSV data for charts, reports, observers, or inference
- Developers checking the current
analyze-csvparser and pipeline behavior
Quick Start
indexly analyze-csv sales.csv --show-summary
Add cleaning and a terminal histogram:
indexly analyze-csv sales.csv \
--auto-clean \
--show-summary \
--show-chart ascii \
--chart-type hist \
--transform auto
For mixed file folders, use the universal dispatcher:
indexly analyze-file sales.csv --auto-clean --show-summary
What analyze-csv Does
The CSV pipeline runs in this order:
- Detect the delimiter from common delimiters such as comma, semicolon, tab, pipe, colon, and tilde.
- Load the CSV with UTF-8 handling.
- Optionally run the cleaning pipeline when
--auto-cleanis set. - Infer numeric columns when most values in a text column can be converted.
- Compute statistics for numeric columns or derived timestamp columns.
- Optionally render charts or export analysis output.
- Persist analysis results unless
--no-persistis set.
Use --auto-clean when the CSV needs datetime parsing, missing-value filling, derived date features, normalization, or outlier removal. See Clean CSV Data for the detailed cleaning behavior.
Statistics Produced
Indexly computes these statistics for each numeric column:
| Metric | Meaning |
|---|---|
Count |
Non-null values used in analysis |
Nulls |
Missing values in the source column |
Mean |
Average value |
Median |
Middle value |
Std Dev |
Standard deviation |
Sum |
Column total |
Min / Max |
Range endpoints |
Q1 / Q3 |
First and third quartiles |
IQR |
Interquartile range |
If a cleaned CSV only contains datetime values, derived _timestamp fields can provide numeric columns for analysis.
Visualization Options
Use --show-chart to choose where the chart renders:
| Mode | Behavior |
|---|---|
ascii |
Renders terminal charts. |
static |
Uses Matplotlib for static charts. |
interactive |
Uses Plotly-style interactive output where supported. |
Supported chart types:
--chart-type bar
--chart-type line
--chart-type box
--chart-type hist
--chart-type scatter
--chart-type pie
Common chart controls:
--x-col date
--y-col revenue profit
--export-plot chart.html
--agg sum
For histogram and boxplot distribution work, transformation can improve readability:
--transform none
--transform log
--transform sqrt
--transform softplus
--transform exp-log
--transform auto
--transform auto chooses a transformation from column skew. ASCII histogram bars use --bar-scale sqrt by default and also accept --bar-scale log.
Time-Series Analysis
For date-indexed CSVs:
indexly analyze-csv sales.csv \
--auto-clean \
--timeseries \
--x order_date \
--y revenue,profit \
--freq M \
--agg sum \
--rolling 3 \
--mode interactive \
--output sales-trend.html
See Time-Series Visualization for dedicated examples.
Boxplot Engine
For focused boxplot work, use the isolated boxplot engine:
indexly analyze-csv sales.csv \
--boxplot \
--group-by region \
--y-col revenue profit \
--show-mean
Useful boxplot options include:
| Option | Values |
|---|---|
--use-raw |
Use raw data for boxplot rendering. |
--use-cleaned |
Use cleaned data for boxplot rendering. |
--use-clean |
Deprecated alias for --use-cleaned (boxplot paths only). |
--norm |
zscore, minmax |
--outliers |
classic, robust, show, hide |
--merge-on |
Merge column for multi-file comparison. |
--merge-how |
inner, left, right, outer |
For analytical artifact lifecycle:
- Default re-analysis behavior prunes superseded hash-versioned Parquet artifacts.
--keep-artifact-historypreserves old artifacts.- Use
indexly clear-data --all --prune-artifactsto remove unreferenced artifact history later.
Export Results
Export the analysis table:
indexly analyze-csv sales.csv --export-path reports/sales.md --format md
Supported analyze-csv formats are:
txtmdjson
Use --compress-export with JSON output when you want .json.gz:
indexly analyze-csv sales.csv --export-path reports/sales.json --format json --compress-export
--export-format is accepted as an alias for CSV analysis output format. For broader tabular export formats such as CSV, Parquet, Excel, or SQLite, use indexly analyze-file with --format.
Practical Workflow
indexly rename-file ./exports --pattern "{date}-{title}" --recursive --dry-run
indexly analyze-csv ./exports/sales.csv --auto-clean --show-summary
indexly analyze-csv ./exports/sales.csv --show-chart ascii --chart-type hist --transform auto
indexly observe audit
This keeps filenames stable, cleans and analyzes the CSV, then lets observers compare persisted snapshots when observer data is available.