Database Analysis – Analyze SQLite Databases
Categories:
Indexly provides a powerful pipeline for inspecting SQLite databases, allowing you to extract meaningful insights and visualize table relationships efficiently. Whether you’re exploring your own datasets or working with sample DBs such as Chinook, analyze-db gives you the ability to summarize, profile, and export your data.
If the SQLite schema matches AutoDoctor’s persistence tables, Indexly switches from generic DB inspection to a specialized operational summary. Use Analyze AutoDoctor Artifacts when your input is autodoctor.db.
Key Features
- Analyze SQLite databases (single or multi-table)
- Summarize table structure, numeric/non-numeric stats, nulls, and top values
- Detect explicit foreign keys and infer relationships heuristically
- Build Mermaid ER diagrams and adjacency graphs
- Export results in JSON, Markdown, HTML, or diagram formats
- Persist analysis at multiple levels:
none,summary,detailed,raw
Using indexly analyze-db
Basic Syntax
indexly analyze-db <db_path> [--show-summary] [--table <table_name>] [--all-tables]
[--sample-size N]
[--persist-level {minimal,full,none}]
[--no-persist]
[--export json,md,html]
[--max-preview ROWS]
Example: Analyze a Single Table and Export Markdown
indexly analyze-db Chinook.db --show-summary --table customers --export md --diagram mermaid
✔ Persisted summary to Chinook.db.analysis.json
Read the persisted JSON summary later:
indexly read-json Chinook.db.analysis.json --show-summary
indexly read-json Chinook.db.analysis.json --treeview --preview 5
read-json is a reader for persisted Indexly summary JSON. It displays stored summary blocks from *.analysis.json files and does not re-analyze the original database.
──────────────────── Dataset Summary Preview ──────────────────── DB Tables Overview
| Table | Rows | Cols | PK | Status |
|---|---|---|---|---|
| albums | 347 | 3 | - | OK |
| artists | 275 | 2 | - | OK |
| customers | 59 | 13 | CustomerId | OK |
| employees | 8 | 15 | - | OK |
| genres | 25 | 2 | - | OK |
| invoices | 412 | 9 | - | OK |
| invoice_items | 2240 | 5 | - | OK |
| media_types | 5 | 2 | - | OK |
| playlists | 18 | 2 | - | OK |
| playlist_track | 8715 | 2 | - | OK |
| tracks | 3503 | 9 | - | OK |
🔗 Detected Relations
| From | → | To |
|---|---|---|
| albums.ArtistId | → | artists.ArtistId |
| customers.SupportRepId | → | employees.EmployeeId |
| employees.ReportsTo | → | employees.EmployeeId |
| invoices.CustomerId | → | customers.CustomerId |
| invoice_items.TrackId | → | tracks.TrackId |
| invoice_items.InvoiceId | → | invoices.InvoiceId |
| playlist_track.TrackId | → | tracks.TrackId |
| playlist_track.PlaylistId | → | playlists.PlaylistId |
| tracks.MediaTypeId | → | media_types.MediaTypeId |
| tracks.GenreId | → | genres.GenreId |
| tracks.AlbumId | → | albums.AlbumId |
📊 Sample Rows (see Sample Size Considerations)
| CustomerId | FirstName | LastName | Fax | SupportRepId | |
|---|---|---|---|---|---|
| 1 | Luís | Gonçalves | +55 (12) 3923-5566 | luisg@embraer.com.br | 3 |
| 2 | Leonie | Köhler | None | leonekohler@surfeu.de | 5 |
| 3 | François | Tremblay | None | ftremblay@gmail.com | 3 |
Profile: customers
| Metric | Value |
|---|---|
| rows | 59 |
| cols | 13 |
| key_hints | - |
| CustomerId (mean) | 30.0 |
| CustomerId (std) | 17.18 |
| SupportRepId (mean) | 3.95 |
| SupportRepId (std) | 0.82 |
| FirstName (unique) | 57 |
| LastName (unique) | 59 |
| Company (unique) | 10 |
| City (unique) | 53 |
| State (unique) | 25 |
| Country (unique) | 24 |
Exported Markdown → Chinook.db.analysis.md
Other Examples
# Analyze all tables, export JSON + Markdown
indexly analyze-db Chinook.db --all-tables --persist-level full --export json,md
# Generate relationships diagram for all tables in Markdown
indexly analyze-db Chinook.db --all-tables --export md --diagram mermaid
# Limit row sample to 20 for faster profiling of a specific table
indexly analyze-db Chinook.db --table tracks --sample-size 20 --show-summary
Analyze-DB Subcommand
Arguments
| Argument | Type / Action | Description |
|---|---|---|
db_path |
str | Path to the SQLite database file. |
--table |
str | Analyze a specific table only. |
--all-tables |
flag | Analyze all tables instead of auto-selecting one. |
Sampling Controls
| Argument | Type / Action | Description |
|---|---|---|
--sample-size |
int | Max rows per table for profiling. Adaptive sampling if omitted. |
--all-data |
flag | Disable sampling and profile full table data. Use cautiously on large DBs. |
--fast |
flag | Lighter profiling for huge tables. Uses bounded previews and skips expensive metrics. |
--fast-mode |
flag | Alias-style fast profiling control retained for compatibility. |
--timeout |
int | Per-table profiling timeout in seconds. |
Output Controls
| Argument | Type / Action | Description |
|---|---|---|
--show-summary |
flag | Print analysis overview to terminal. |
--no-persist |
flag | Do not write summary file to disk. |
--persist-level |
choice: minimal/full/none | Level of detail to persist. Default: full |
--export |
comma-separated json/md/html | Export summary in one or more formats, for example --export json,md. |
--diagram |
choice: mermaid | Include diagrams in MD/HTML export. |
--parallel |
flag | Profile multiple tables in parallel. |
--max-workers |
int | Max parallel workers (default = CPU count). |
--max-preview |
int | Max sample rows to show in terminal preview. Default: 10. |
Sample Size Considerations
When performing table profiling with Indexly, sample size selection plays a crucial role in balancing accuracy and performance. For instance, smaller sample sizes speed up profiling and reduce memory usage, but they may underrepresent rare values or subtle numeric patterns. Conversely, larger samples improve statistical stability, capturing both frequent and uncommon categories more reliably.
Key considerations:
- Central Limit Theorem (CLT) – For numeric columns, a minimum of 30–100 rows per column generally suffices to stabilize the sampling distribution of means, medians, and standard deviations.
- Margin of Error – For proportion estimates, the number of rows required depends mainly on the desired confidence level and precision, not the total table size. Typical guidance:
| Margin of Error | Approx. Sample Size |
|---|---|
| ±5% | ~385 rows |
| ±2% | ~2,401 rows |
| ±1% | ~9,604 rows |
Even for very large tables, accurate summary statistics can often be derived from a well-chosen subset.
- Rare-value detection – If a category appears in 0.1% of rows, a sample of 10,000 ensures near certainty of capturing it. Smaller samples may miss these rare occurrences entirely.
Conclusion: Indexly applies adaptive sampling to balance speed and accuracy. Users can override defaults with --sample-size for a bounded profile or --all-data when full-table completeness is more important than speed and memory use.
When To Use analyze-db vs analyze-file
| Goal | Recommended command |
|---|---|
| Quick inspection of an unknown SQLite file | indexly analyze-file <db> --show-summary |
| Deep schema and profiling work on a generic SQLite DB | indexly analyze-db <db> ... |
Operational summary of autodoctor.db |
indexly analyze-autodoctor <db> or indexly analyze-db <db> |
For an example analysis into Chinook’s ecosystem, see Chinook DB Analysis.
Related pages: