Mike Lang
3e836b935b
grafana: Disable analytics
11 months ago
Mike Lang
3df15b5784
grafana: Make it HTTPS
...
Note this introduces two files the user must provide.
We set up the makefile to fail if they don't exist.
11 months ago
Mike Lang
1e6b8e576e
grafana: Use config file instead of env vars
11 months ago
Mike Lang
ab3a89e6a6
Pin grafana version
...
This is required due to grafana changes invalidating our old dashboards.
We don't have time to fix them right now.
11 months ago
Mike Lang
c18df50a7c
monitoring: only request for services matching role
3 years ago
Mike Lang
bb3814f9f7
overview.jsonnet: Use a template variable to allow restricting to certain nodes
5 years ago
Mike Lang
3fabb2944f
prometheus: Add scheme to url
5 years ago
Mike Lang
6e067fab83
prometheus: fix mistake
5 years ago
Mike Lang
d76f38bf20
prometheus: include url as a label
...
for coverage maps
5 years ago
Mike Lang
9a1369cf98
overview: Fix job -> service
5 years ago
Mike Lang
e1993c6a79
overview dashboard: Look up services by 'service' label, not job
...
Job can't be repeated across scrape jobs, service can
5 years ago
Mike Lang
ac98d67853
overview dashboard: Hide UNEDITED and DONE states so the others are visible
5 years ago
Mike Lang
8a65d18f74
prometheus config: Support mixed http and https scraping
5 years ago
Mike Lang
4b04f70b6f
overview dashboard: Add system-level metrics
5 years ago
Mike Lang
cff5c38691
Add new dashboard
5 years ago
Mike Lang
89a9e5554c
sheetsync: Record counts of rows in the DB, segmented by various columns
...
This lets us view a number of useful graphs in dashboards, eg. rows by state,
errored rows, rows by day, rows by category, meltdowns per day, fraction of
events that are poster moments by category.
Sheetsync was the natural place to do this since it was already periodically scanning
the entire events table.
5 years ago
Mike Lang
72172024be
overview dashboard: Stop reporting stream delay after stream stops
...
It just goes up forever and isn't helpful.
5 years ago
Mike Lang
77f23d775a
overview dashboard: Show offending instance in error log rate graph
5 years ago
Mike Lang
e5a7c8adfa
monitoring: Add "role" concept
...
This lets us know if a service is MEANT to be running or not.
5 years ago
Mike Lang
21a46a66bb
monitoring: Set instance to friendly name for each node we're monitoring
...
So that you get eg. "charm" instead of "IP:PORT"
5 years ago
Mike Lang
51adeeab19
monitoring: Fix problems with the prometheus container
5 years ago
Mike Lang
b84d4de085
Add segment_coverage service to be monitored
5 years ago
Mike Lang
37c9eff587
Provide a rendered version of the dashboard
...
so that it's possible to use without the tooling I can't share.
5 years ago
Mike Lang
1721fbd92e
fix dashboards for channel/quality naming
5 years ago
Mike Lang
ca925ae2e6
dashboard: Add some extra detail sections for backfiller and downloader
5 years ago
Mike Lang
39e7a5c2e6
Add overview dashboard
5 years ago
Mike Lang
7273ee071e
monitoring fixes
5 years ago
Mike Lang
5a6d443efd
grafana: View-only anonymous access
5 years ago
Mike Lang
a767760f02
Add some existing scripts for setting up prometheus
5 years ago