Database Analysis – Analyze SQLite Databases

Analyze SQLite databases with Indexly to extract table summaries, detect relationships, generate ER diagrams, and export structured insights in JSON, Markdown, or HTML.

Tags:

Indexly provides a powerful pipeline for inspecting SQLite databases, allowing you to extract meaningful insights and visualize table relationships efficiently. Whether you’re exploring your own datasets or working with sample DBs such as Chinook, analyze-db gives you the ability to summarize, profile, and export your data.

Key Features

Analyze SQLite databases (single or multi-table)
Summarize table structure, numeric/non-numeric stats, nulls, and top values
Detect explicit foreign keys and infer relationships heuristically
Build Mermaid ER diagrams and adjacency graphs
Export results in JSON, Markdown, HTML, or diagram formats
Persist analysis at multiple levels: none, summary, detailed, raw

Using `indexly analyze-db`

Basic Syntax

indexly analyze-db <db_path> [--show-summary] [--table <table_name>] [--all-tables]
                     [--sample-size N]
                     [--persist-level {none,summary,detailed,raw}]
                     [--no-persist]
                     [--export {json,md,html}]
                     [--max-preview ROWS]

Example: Analyze a Single Table and Export Markdown

indexly analyze-db Chinook.db --show-summary --table customers --export md --diagram mermaid
✔ Persisted summary to Chinook.db.analysis.json

Learn how to read JSON summary export

──────────────────── Dataset Summary Preview ──────────────────── DB Tables Overview

Table	Rows	Cols	PK	Status
albums	347	3	-	OK
artists	275	2	-	OK
customers	59	13	CustomerId	OK
employees	8	15	-	OK
genres	25	2	-	OK
invoices	412	9	-	OK
invoice_items	2240	5	-	OK
media_types	5	2	-	OK
playlists	18	2	-	OK
playlist_track	8715	2	-	OK
tracks	3503	9	-	OK

🔗 Detected Relations

From	→	To
albums.ArtistId	→	artists.ArtistId
customers.SupportRepId	→	employees.EmployeeId
employees.ReportsTo	→	employees.EmployeeId
invoices.CustomerId	→	customers.CustomerId
invoice_items.TrackId	→	tracks.TrackId
invoice_items.InvoiceId	→	invoices.InvoiceId
playlist_track.TrackId	→	tracks.TrackId
playlist_track.PlaylistId	→	playlists.PlaylistId
tracks.MediaTypeId	→	media_types.MediaTypeId
tracks.GenreId	→	genres.GenreId
tracks.AlbumId	→	albums.AlbumId

📊 Sample Rows (see Sample Size Considerations)

CustomerId	FirstName	LastName	Fax	Email	SupportRepId
1	Luís	Gonçalves	+55 (12) 3923-5566	luisg@embraer.com.br	3
2	Leonie	Köhler	None	leonekohler@surfeu.de	5
3	François	Tremblay	None	ftremblay@gmail.com	3

Profile: customers

Metric	Value
rows	59
cols	13
key_hints	-
CustomerId (mean)	30.0
CustomerId (std)	17.18
SupportRepId (mean)	3.95
SupportRepId (std)	0.82
FirstName (unique)	57
LastName (unique)	59
Company (unique)	10
City (unique)	53
State (unique)	25
Country (unique)	24

Exported Markdown → Chinook.db.analysis.md

Other Examples

# Analyze all tables, export JSON + Markdown
indexly analyze-db Chinook.db --all-tables --persist-level full --export json,md

# Generate relationships diagram for all tables in Markdown
indexly analyze-db Chinook.db --all-tables --export md --diagram mermaid

# Limit row sample to 20 for faster profiling of a specific table
indexly analyze-db Chinook.db --table tracks --sample-size 20 --show-summary

Analyze-DB Subcommand

Arguments

Argument	Type / Action	Description
`db_path`	str	Path to the SQLite database file.
`--table`	str	Analyze a specific table only.
`--all-tables`	flag	Analyze all tables instead of auto-selecting one.

Sampling Controls

Argument	Type / Action	Description
`--sample-size`	int	Max rows per table for profiling. Adaptive sampling if omitted.
`--all-data`	flag	Disable sampling, use full table data.
`--fast`	flag	Lighter profiling for huge tables.
`--fast-mode`	flag	Enable fast profiling mode (lighter metrics, faster).
`--timeout`	int	Per-table profiling timeout in seconds.

Output Controls

Argument	Type / Action	Description
`--show-summary`	flag	Print analysis overview to terminal.
`--no-persist`	flag	Do not write summary file to disk.
`--persist-level`	choice: minimal/full/none	Level of detail to persist. Default: full
`--export`	choice: json/md/html	Export summary in the chosen format.
`--diagram`	choice: mermaid	Include diagrams in MD/HTML export.
`--parallel`	flag	Profile multiple tables in parallel.
`--max-workers`	int	Max parallel workers (default = CPU count).

Sample Size Considerations

When performing table profiling with Indexly, sample size selection plays a crucial role in balancing accuracy and performance. For instance, smaller sample sizes speed up profiling and reduce memory usage, but they may underrepresent rare values or subtle numeric patterns. Conversely, larger samples improve statistical stability, capturing both frequent and uncommon categories more reliably.

Key considerations:

Central Limit Theorem (CLT) – For numeric columns, a minimum of 30–100 rows per column generally suffices to stabilize the sampling distribution of means, medians, and standard deviations.
Margin of Error – For proportion estimates, the number of rows required depends mainly on the desired confidence level and precision, not the total table size. Typical guidance:

Margin of Error	Approx. Sample Size
±5%	~385 rows
±2%	~2,401 rows
±1%	~9,604 rows

Even for very large tables, accurate summary statistics can often be derived from a well-chosen subset.

Rare-value detection – If a category appears in 0.1% of rows, a sample of 10,000 ensures near certainty of capturing it. Smaller samples may miss these rare occurrences entirely.

Conclusion: Indexly applies adaptive sampling to balance speed and accuracy. Users can override defaults with --sample-size or --all-data depending on whether speed or completeness is prioritized.

For an example analysis into Chinook’s ecosystem, see the Chinook DB Analysis