Prometheus / sql_exporter Path¶

SQLMonitor's classic architecture pulls everything into SQL Server tables. For shops that already run Prometheus + Grafana — or want lower-latency, cardinality-rich metrics — there is a parallel path that publishes the same signal set as Prometheus metrics via burningalchemist/sql_exporter.

The two paths are complementary, not mutually exclusive. Most of the Grafana dashboards can switch between a SQL datasource and a Prometheus datasource using the same panel queries wrapped in $__interval.

What you get¶


Exporter	`sql_exporter` (Go binary) running on every monitored instance on TCP 9399.
Scrape cadence	15s default (configurable in `prometheus.yml`).
Collectors	7 YAML files under `sql_exporter/`, one per signal family.
Prometheus dashboard	`SQL-Exporter-Metrics-Dashboard-External.json` — drop-in replacement for "Monitoring — Live — All Servers".
Alerting rules	`sql_exporter/alert-engine/Alert-Rules-SQLMonitor.yaml` — Grafana unified alerting format.

Files in the repo¶

File	Purpose
`sql_exporter.yml`	Top-level exporter config — targets, credentials, job groups.
`mssql_dba_cached.collector.yml`	Low-churn metrics cached server-side (disk space, databases, file IO).
`mssql_dba_regular.collector.yml`	The bread-and-butter volatile metrics (CPU, memory, active/blocked requests, wait stats).
`mssql_dba_stableinfo.collector.yml`	Instance- and host-level stable facts (SQL version, core count, RAM).
`mssql_dba_aghealth.collector.yml`	AG primary/secondary state, redo/log-send queue.
`mssql_dba_whoisactive.collector.yml`	Top concurrent queries — higher cardinality.
`mssql_sqlagent_jobs.collector.yml`	SQL Agent job status / outcome / duration / next-run / 24h-failure count from `msdb`.
`mssql_backup_history.collector.yml`	Per-(database, backup_type) last-time / size / duration / age from `msdb.dbo.backupset`.
`mssql_xevent.collector.yml`	5-minute aggregates from `DBA.dbo.xevent_metrics` (guarded with an existence check; no-op where the XEvent collector proc isn't installed).
`mssql_standard.collector.yml`	The upstream sql_exporter stock collector.
`windows_exporter_config.yml`	Matching config for windows_exporter — host-level CPU, disk, network.
`tessell-metrics.collector.yml`	Managed-instance variant (no `xp_cmdshell`).

Full metric reference: SQL-Exporter-Metrics-Documentation.md and the quick-ref.

Install — Windows¶

```powershell

1. Drop the binary + config in place¶

(Download sql_exporter.exe from https://github.com/burningalchemist/sql_exporter/releases)¶

Copy-Item .\sql_exporter.exe C:\Program Files\sql_exporter\ Copy-Item .\sql_exporter*.yml C:\Program Files\sql_exporter\

2. Register as a service¶

New-Service -Name sql_exporter -BinaryPathName 'C:\Program Files\sql_exporter\sql_exporter.exe --config.file C:\Program Files\sql_exporter\sql_exporter.yml' -StartupType Automatic ` -DisplayName 'SQL Exporter for Prometheus'

3. Set the service logon account to the sqlmonitor proxy user, then start¶

Start-Service sql_exporter

4. Open the scrape port¶

New-NetFirewallRule -DisplayName 'SQL Exporter (TCP/9399)' ` -Direction Inbound -Protocol TCP -LocalPort 9399 -Action Allow

5. Smoke test¶

Invoke-WebRequest http://localhost:9399/metrics -UseBasicParsing | Select-Object -ExpandProperty Content | Select-Object -First 20 ```

Install — macOS (launchd)¶

Full step-by-step (including plist and log rotation) is in README-sql_exporter.md. Summary:

bash brew install prometheus grafana sudo cp com.sql_exporter.plist /Library/LaunchDaemons/ sudo launchctl load /Library/LaunchDaemons/com.sql_exporter.plist curl http://localhost:9399/metrics | head

Wire into Prometheus¶

Add a job to prometheus.yml:

yaml scrape_configs: - job_name: sqlmonitor_mssql scrape_interval: 15s static_configs: - targets: - sqlnode-01.corp:9399 - sqlnode-02.corp:9399 - aghost-1a.corp:9399 - aghost-1b.corp:9399 relabel_configs: - source_labels: [__address__] regex: '([^:]+):.*' target_label: sql_instance replacement: '$1'

For AGs add sqlmonitor_ag with scrape_interval: 30s pointing at the same targets — the mssql_dba_aghealth collector is slightly heavier.

Wire into Grafana¶

Add a Prometheus datasource named Prometheus (the dashboards query it by that exact name).
Import sql_exporter/SQL-Exporter-Metrics-Dashboard-External.json — this is the Prometheus counterpart of the SQL-backed distributed dashboard.
Optionally import the unified-alerting YAMLs from sql_exporter/alert-engine/ — they define the same CPU / memory / wait / blocking alerts the Python engine provides, but run inside Grafana Alerting instead.

Prometheus-backed dashboard pack¶

The repo ships a parallel set of Grafana dashboards under sql_exporter/Prometheus-Dashboards/ that mirror the SQL-backed dashboards in Grafana-Dashboards/ but source every panel from Prometheus metrics.

Phase 1 ships 12 dashboards covering the fleet's core signal set:

UID	Title	Data panels
`prom_core_metrics_trend`	Core Metrics - Trend	9
`prom_wait_stats`	Wait Stats	4
`prom_disk_space`	Disk Space	5
`prom_ag_health_state`	Ag Health State	3
`prom_sql_agent_jobs`	SQL Agent Jobs	6
`prom_backup_history`	Backup History	6
`prom_xevent_trend`	XEvent - Trend	4
`prom_database_file_io_stats`	Database File IO Stats	12
`prom_dba_inventory`	DBA Inventory	6 (+8 deep-links)
`prom_monitoring_live_all_servers`	Monitoring - Live - All Servers	15 (+6 deep-links)
`prom_monitoring_live_distributed`	Monitoring - Live - Distributed	52 (+6 deep-links)
`prom_monitoring_perfmon_quest`	Monitoring - Perfmon Counters - Quest Softwares - Distributed	51 (+4 deep-links)

Panels that depend on the SQLMonitor central inventory database (alert history, AG-vs-nonAG backup split, LAMA config-change deltas, dm_os_memory_clerks snapshot, tempdb/log_space cache tables, sql_server_patching) render as markdown deep-link tiles that jump back to the SQL-backed dashboard so every source section remains visible.

Regenerating the dashboards¶

Each dashboard is built from a small Python spec:

bash cd sql_exporter/Prometheus-Dashboards python3 generate.py # rebuild every *.json python3 generate.py backup # filter: rebuild only backup_history python3 _tools/validate.py # structural + expr sanity check

High-fidelity PromQL patterns used across the specs:

increase(metric[$__range]) — selective-duration deltas (File IO, Wait Stats).
@ end() offset $__range — prior-window comparison tables (day-over-day).
quantile_over_time($percentile_q, (expr)[$trend_window:]) — percentile trends.
topk($top_n, sum by (...) (...)) — XEvent / wait-type / memory-consumer trends.
time() - timestamp(up == 1) — data-collection-issue detection.

See the folder's README.md for the full spec-driven workflow and Grafana API bulk-import snippet.

Choosing between the two paths¶

Consideration	SQL table path (classic)	Prometheus path
Installation on monitored hosts	SQL Agent jobs only	Requires `sql_exporter` + `windows_exporter` service
Retention	You decide (SQL partitioned tables, months-to-years)	Prometheus TSDB (weeks) or Mimir/Thanos for long-term
Query shape	Ad-hoc T-SQL, great for forensic deep dives	PromQL, great for fleet aggregates and alerting
Historical dashboards	Fast (pre-aggregated tables)	Fast (per-scrape time series)
WhoIsActive / Blitz	Native	Cardinality spike — run on a separate scrape job
Alerting	`Alerting/` Python engine	Grafana Alerting / Alertmanager

A common production topology uses both: the SQL path for long retention + forensic tooling, and Prometheus for low-latency fleet dashboards and PagerDuty-grade alert routing.

Migrating from Perfmon file-collection to sql_exporter¶

Historically SQLMonitor captured host Perfmon counters into BLG files and parsed them server-side. That path is still available, but shops that already run Prometheus usually switch to windows_exporter + sql_exporter instead. The migration story (mapping old counters to Prometheus metric names) is tracked in Perfmon-SQL-Exporter-Migration-Progress.md.

PromQL quick-start¶

A tutorial tailored to the SQLMonitor metric set lives at _README-PromQL-Querying.md — covers the common rate(), increase(), topk() patterns for wait stats, CPU, IOPS and AG health.