Skip to content

Data Flow

How one metric travels from "event happened on the host" to "you see it in Grafana and get paged".

The canonical flow (Perfmon CPU example)

sequenceDiagram
    autonumber
    participant OS as Windows Perfmon (BLG)
    participant HOST as Monitored host disk
    participant JOB as (dba) Collect-PerfmonData
    participant T as dbo.performance_counters
    participant INVJ as (dba) Get-AllServerVolatileInfo
    participant INV as Inventory.DBA.dbo.all_server_volatile_info
    participant G as Grafana
    participant A as Alert Engine

    OS->>HOST: Writes BLG every 10s via logman
    JOB->>HOST: Every 2 min: consume BLG → XML → bulk insert
    JOB->>T: INSERT rows (event_time, counter, value)
    INVJ->>T: SELECT delta over linked server
    INVJ->>INV: UPSERT aggregates (per server)
    G->>INV: Dashboard panel query
    G-->>-G: Render
    A->>INV: usp_get_active_alert_by_* procs
    A-->>A: Threshold check
    A-->>A: Route to PagerDuty / Slack / Email

Pattern: three kinds of collection

SQLMonitor uses three distinct collection patterns depending on what is being measured.

1. Host-side file ingest (Perfmon, OS processes, Tasklist)

Files on disk → PowerShell reader → bulk insert.

Job Script Destination table Notes
(dba) Collect-PerfmonData SQLMonitor/perfmon-collector-push-to-sqlserver.ps1 dbo.performance_counters Reads .blg from logman data collector
(dba) Collect-OSProcesses SQLMonitor/perfmon-collector-push-to-sqlserver.ps1 (tasklist branch) dbo.os_task_list CPU / memory per process
(dba) Collect-DiskSpace SQLMonitor/disk-space-collector.ps1 dbo.disk_space Uses Get-Volume on the host

2. In-engine T-SQL capture (WaitStats, FileIO, MemoryClerks, XEvents, BlitzFirst, …)

Stored procedure → table. Delta or snapshot depending on metric.

Job Procedure Destination table Capture style
(dba) Collect-WaitStats dbo.usp_collect_wait_stats dbo.wait_stats Cumulative snapshot (compute delta at read-time)
(dba) Collect-FileIOStats dbo.usp_collect_file_io_stats dbo.file_io_stats Cumulative snapshot
(dba) Collect-MemoryClerks dbo.usp_collect_memory_clerks dbo.memory_clerks Point-in-time
(dba) Collect-XEvents dbo.usp_collect_xevent_metrics dbo.xevent_metrics Consumes XE files, inserts normalized+hashed rows
(dba) Collect-AgHealthState dbo.usp_collect_ag_health_state dbo.ag_health_state Point-in-time
(dba) Run-WhoIsActive dbo.usp_run_WhoIsActive dbo.WhoIsActive Append per invocation
(dba) Run-Blitz / Run-BlitzIndex dbo.sp_Blitz / dbo.sp_BlitzIndex dbo.blitz* tables Append weekly/daily
(dba) Run-LogSaver dbo.usp_LogSaver dbo.log_space_consumers Finds txn-log fillers; can kill
(dba) Run-TempDbSaver dbo.usp_TempDbSaver dbo.tempdb_space_usage Finds tempdb consumers; can kill

3. Inventory aggregation (Get-AllServer*)

Per-instance tables → linked-server pull → all_server_* table on inventory.

Job Reads Writes
(dba) Get-AllServerVolatileInfo Remote DBA DB on each server via linked server dbo.all_server_volatile_info, dbo.all_server_volatile_info_history (Memory-Optimized)
(dba) Get-AllServerStableInfo Remote DBA DB dbo.all_server_stable_info, dbo.all_server_stable_info_history
(dba) Get-AllServerBackups msdb.dbo.backupset on each server dbo.backups_all_servers
(dba) Get-AllServerSqlAgentJobs msdb on each server dbo.sql_agent_jobs_all_servers
(dba) Get-AllServerAgHealthState dbo.ag_health_state on each server dbo.ag_health_state_all_servers
(dba) Get-AllServerDiskSpace dbo.disk_space on each server dbo.disk_space_all_servers
(dba) Get-AllServerLogSpaceConsumers dbo.log_space_consumers on each server dbo.log_space_consumers_all_servers
(dba) Get-AllServerTempdbSpaceUsage dbo.tempdb_space_usage on each server dbo.tempdb_space_usage_all_servers
(dba) Get-AllServerServices sys.dm_server_services on each server dbo.services_all_servers
(dba) Get-AllServerCollectionLatencyInfo Metadata on each server dbo.all_server_collection_latency_info

Aggregation is orchestrated by dbo.usp_GetAllServerInfo and its variants.

Lifecycle: partition → compress → purge

flowchart LR
    INS[Collector INSERT] --> P1[Hourly partition<br/>created by usp_partition_maintenance]
    P1 --> CMP[Page compression<br/>via usp_enable_page_compression]
    CMP --> OLD[Old partition<br/>exceeds retention]
    OLD --> PURGE[usp_purge_tables<br/>truncates partition]

Retention is declarative: dbo.purge_table holds one row per table with its retention days. The (dba) Partitions-Maintenance and (dba) Purge-Tables jobs do the rest.

Alert path

flowchart LR
    T[(Collection tables)] --> USP[usp_get_active_alert_by_key<br/>usp_get_active_alert_by_state]
    USP --> ENG[Alert Engine - Python Flask]
    ENG -->|new| SLACK[Slack]
    ENG -->|new| PD[PagerDuty]
    ENG -->|new| MAIL[DatabaseMail]
    ENG -->|update| SLACK
    ENG -->|ack/resolve| PD

Alert state is persisted in dbo.sma_alerts so re-raises, updates and resolutions are idempotent across restarts. See Alerting for the full engine description.