Commit Graph

29 Commits (70c8afe779da2d617c44e0f66df68447e8fae619)

Author SHA1 Message Date
Mike Lang 3e836b935b grafana: Disable analytics 1 year ago
Mike Lang 3df15b5784 grafana: Make it HTTPS
Note this introduces two files the user must provide.
We set up the makefile to fail if they don't exist.
1 year ago
Mike Lang 1e6b8e576e grafana: Use config file instead of env vars 1 year ago
Mike Lang ab3a89e6a6 Pin grafana version
This is required due to grafana changes invalidating our old dashboards.
We don't have time to fix them right now.
1 year ago
Mike Lang c18df50a7c monitoring: only request for services matching role 3 years ago
Mike Lang bb3814f9f7 overview.jsonnet: Use a template variable to allow restricting to certain nodes 5 years ago
Mike Lang 3fabb2944f prometheus: Add scheme to url 5 years ago
Mike Lang 6e067fab83 prometheus: fix mistake 5 years ago
Mike Lang d76f38bf20 prometheus: include url as a label
for coverage maps
5 years ago
Mike Lang 9a1369cf98 overview: Fix job -> service 5 years ago
Mike Lang e1993c6a79 overview dashboard: Look up services by 'service' label, not job
Job can't be repeated across scrape jobs, service can
5 years ago
Mike Lang ac98d67853 overview dashboard: Hide UNEDITED and DONE states so the others are visible 5 years ago
Mike Lang 8a65d18f74 prometheus config: Support mixed http and https scraping 5 years ago
Mike Lang 4b04f70b6f overview dashboard: Add system-level metrics 5 years ago
Mike Lang cff5c38691 Add new dashboard 5 years ago
Mike Lang 89a9e5554c sheetsync: Record counts of rows in the DB, segmented by various columns
This lets us view a number of useful graphs in dashboards, eg. rows by state,
errored rows, rows by day, rows by category, meltdowns per day, fraction of
events that are poster moments by category.

Sheetsync was the natural place to do this since it was already periodically scanning
the entire events table.
5 years ago
Mike Lang 72172024be overview dashboard: Stop reporting stream delay after stream stops
It just goes up forever and isn't helpful.
5 years ago
Mike Lang 77f23d775a overview dashboard: Show offending instance in error log rate graph 5 years ago
Mike Lang e5a7c8adfa monitoring: Add "role" concept
This lets us know if a service is MEANT to be running or not.
5 years ago
Mike Lang 21a46a66bb monitoring: Set instance to friendly name for each node we're monitoring
So that you get eg. "charm" instead of "IP:PORT"
5 years ago
Mike Lang 51adeeab19 monitoring: Fix problems with the prometheus container 5 years ago
Mike Lang b84d4de085 Add segment_coverage service to be monitored 5 years ago
Mike Lang 37c9eff587 Provide a rendered version of the dashboard
so that it's possible to use without the tooling I can't share.
5 years ago
Mike Lang 1721fbd92e fix dashboards for channel/quality naming 5 years ago
Mike Lang ca925ae2e6 dashboard: Add some extra detail sections for backfiller and downloader 6 years ago
Mike Lang 39e7a5c2e6 Add overview dashboard 6 years ago
Mike Lang 7273ee071e monitoring fixes 6 years ago
Mike Lang 5a6d443efd grafana: View-only anonymous access 6 years ago
Mike Lang a767760f02 Add some existing scripts for setting up prometheus 6 years ago