Data Flow¶
How one metric travels from "event happened on the host" to "you see it in Grafana and get paged".
The canonical flow (Perfmon CPU example)¶
sequenceDiagram
autonumber
participant OS as Windows Perfmon (BLG)
participant HOST as Monitored host disk
participant JOB as (dba) Collect-PerfmonData
participant T as dbo.performance_counters
participant INVJ as (dba) Get-AllServerVolatileInfo
participant INV as Inventory.DBA.dbo.all_server_volatile_info
participant G as Grafana
participant A as Alert Engine
OS->>HOST: Writes BLG every 10s via logman
JOB->>HOST: Every 2 min: consume BLG → XML → bulk insert
JOB->>T: INSERT rows (event_time, counter, value)
INVJ->>T: SELECT delta over linked server
INVJ->>INV: UPSERT aggregates (per server)
G->>INV: Dashboard panel query
G-->>-G: Render
A->>INV: usp_get_active_alert_by_* procs
A-->>A: Threshold check
A-->>A: Route to PagerDuty / Slack / Email Pattern: three kinds of collection¶
SQLMonitor uses three distinct collection patterns depending on what is being measured.
1. Host-side file ingest (Perfmon, OS processes, Tasklist)¶
Files on disk → PowerShell reader → bulk insert.
| Job | Script | Destination table | Notes |
|---|---|---|---|
(dba) Collect-PerfmonData | SQLMonitor/perfmon-collector-push-to-sqlserver.ps1 | dbo.performance_counters | Reads .blg from logman data collector |
(dba) Collect-OSProcesses | SQLMonitor/perfmon-collector-push-to-sqlserver.ps1 (tasklist branch) | dbo.os_task_list | CPU / memory per process |
(dba) Collect-DiskSpace | SQLMonitor/disk-space-collector.ps1 | dbo.disk_space | Uses Get-Volume on the host |
2. In-engine T-SQL capture (WaitStats, FileIO, MemoryClerks, XEvents, BlitzFirst, …)¶
Stored procedure → table. Delta or snapshot depending on metric.
| Job | Procedure | Destination table | Capture style |
|---|---|---|---|
(dba) Collect-WaitStats | dbo.usp_collect_wait_stats | dbo.wait_stats | Cumulative snapshot (compute delta at read-time) |
(dba) Collect-FileIOStats | dbo.usp_collect_file_io_stats | dbo.file_io_stats | Cumulative snapshot |
(dba) Collect-MemoryClerks | dbo.usp_collect_memory_clerks | dbo.memory_clerks | Point-in-time |
(dba) Collect-XEvents | dbo.usp_collect_xevent_metrics | dbo.xevent_metrics | Consumes XE files, inserts normalized+hashed rows |
(dba) Collect-AgHealthState | dbo.usp_collect_ag_health_state | dbo.ag_health_state | Point-in-time |
(dba) Run-WhoIsActive | dbo.usp_run_WhoIsActive | dbo.WhoIsActive | Append per invocation |
(dba) Run-Blitz / Run-BlitzIndex | dbo.sp_Blitz / dbo.sp_BlitzIndex | dbo.blitz* tables | Append weekly/daily |
(dba) Run-LogSaver | dbo.usp_LogSaver | dbo.log_space_consumers | Finds txn-log fillers; can kill |
(dba) Run-TempDbSaver | dbo.usp_TempDbSaver | dbo.tempdb_space_usage | Finds tempdb consumers; can kill |
3. Inventory aggregation (Get-AllServer*)¶
Per-instance tables → linked-server pull → all_server_* table on inventory.
| Job | Reads | Writes |
|---|---|---|
(dba) Get-AllServerVolatileInfo | Remote DBA DB on each server via linked server | dbo.all_server_volatile_info, dbo.all_server_volatile_info_history (Memory-Optimized) |
(dba) Get-AllServerStableInfo | Remote DBA DB | dbo.all_server_stable_info, dbo.all_server_stable_info_history |
(dba) Get-AllServerBackups | msdb.dbo.backupset on each server | dbo.backups_all_servers |
(dba) Get-AllServerSqlAgentJobs | msdb on each server | dbo.sql_agent_jobs_all_servers |
(dba) Get-AllServerAgHealthState | dbo.ag_health_state on each server | dbo.ag_health_state_all_servers |
(dba) Get-AllServerDiskSpace | dbo.disk_space on each server | dbo.disk_space_all_servers |
(dba) Get-AllServerLogSpaceConsumers | dbo.log_space_consumers on each server | dbo.log_space_consumers_all_servers |
(dba) Get-AllServerTempdbSpaceUsage | dbo.tempdb_space_usage on each server | dbo.tempdb_space_usage_all_servers |
(dba) Get-AllServerServices | sys.dm_server_services on each server | dbo.services_all_servers |
(dba) Get-AllServerCollectionLatencyInfo | Metadata on each server | dbo.all_server_collection_latency_info |
Aggregation is orchestrated by dbo.usp_GetAllServerInfo and its variants.
Lifecycle: partition → compress → purge¶
flowchart LR
INS[Collector INSERT] --> P1[Hourly partition<br/>created by usp_partition_maintenance]
P1 --> CMP[Page compression<br/>via usp_enable_page_compression]
CMP --> OLD[Old partition<br/>exceeds retention]
OLD --> PURGE[usp_purge_tables<br/>truncates partition] Retention is declarative: dbo.purge_table holds one row per table with its retention days. The (dba) Partitions-Maintenance and (dba) Purge-Tables jobs do the rest.
Alert path¶
flowchart LR
T[(Collection tables)] --> USP[usp_get_active_alert_by_key<br/>usp_get_active_alert_by_state]
USP --> ENG[Alert Engine - Python Flask]
ENG -->|new| SLACK[Slack]
ENG -->|new| PD[PagerDuty]
ENG -->|new| MAIL[DatabaseMail]
ENG -->|update| SLACK
ENG -->|ack/resolve| PD Alert state is persisted in dbo.sma_alerts so re-raises, updates and resolutions are idempotent across restarts. See Alerting for the full engine description.