Commit Graph

15 Commits (e9b19c327967f1a5d985aa842735ea7f3cc712a2)

Author SHA1 Message Date
Mike Lang bb3814f9f7 overview.jsonnet: Use a template variable to allow restricting to certain nodes
Mike Lang 9a1369cf98 overview: Fix job -> service
Mike Lang e1993c6a79 overview dashboard: Look up services by 'service' label, not job
Job can't be repeated across scrape jobs, service can
Mike Lang ac98d67853 overview dashboard: Hide UNEDITED and DONE states so the others are visible
Mike Lang 4b04f70b6f overview dashboard: Add system-level metrics
Mike Lang cff5c38691 Add new dashboard
Mike Lang 89a9e5554c sheetsync: Record counts of rows in the DB, segmented by various columns
This lets us view a number of useful graphs in dashboards, eg. rows by state,
errored rows, rows by day, rows by category, meltdowns per day, fraction of
events that are poster moments by category.

Sheetsync was the natural place to do this since it was already periodically scanning
the entire events table.
Mike Lang 72172024be overview dashboard: Stop reporting stream delay after stream stops
It just goes up forever and isn't helpful.
Mike Lang 77f23d775a overview dashboard: Show offending instance in error log rate graph
Mike Lang e5a7c8adfa monitoring: Add "role" concept
This lets us know if a service is MEANT to be running or not.
Mike Lang b84d4de085 Add segment_coverage service to be monitored
Mike Lang 37c9eff587 Provide a rendered version of the dashboard
so that it's possible to use without the tooling I can't share.
Mike Lang 1721fbd92e fix dashboards for channel/quality naming
Mike Lang ca925ae2e6 dashboard: Add some extra detail sections for backfiller and downloader
Mike Lang 39e7a5c2e6 Add overview dashboard