Commit Graph

80 Commits (mike/prizebot)

Author SHA1 Message Date
Mike Lang 3606fadaa8 Pin gevent version to work around build issues
Seeing the following error on latest versions of gevent:

 Traceback (most recent call last):
   File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
     return _run_code(code, main_globals, None,
   File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
     exec(code, run_globals)
   File "/usr/lib/python3.9/site-packages/zulip_bots/schedulebot.py", line 2, in <module>
     import gevent.monkey
   File "/usr/lib/python3.9/site-packages/gevent/__init__.py", line 72, in <module>
     from gevent._hub_local import get_hub
   File "/usr/lib/python3.9/site-packages/gevent/_hub_local.py", line 150, in <module>
     import_c_accel(globals(), 'gevent.__hub_local')
   File "/usr/lib/python3.9/site-packages/gevent/_util.py", line 148, in import_c_accel
     mod = importlib.import_module(cname)
   File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
     return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'gevent._gevent_c_hub_local'
1 year ago
Mike Lang 78c053000e Upgrade pip in order to make wheels work 1 year ago
Mike Lang fccec1ace0 downloader: Fix a bug where hashes are miscalculated
due to not including the map data in the hash calculation.

This is only relevant for streams with map data, which does not include twitch or youtube URLs.
1 year ago
Mike Lang 5f42b8419e downloader: Hard-code source quality for non-twitch providers
since they can't do any other quality, but we still want to be able to set other qualities
for twitch streams.

Really qualities should be per-channel but I'm being lazy.
1 year ago
Mike Lang 304e7b0fd9 downloader: Install yt-dlp in docker container
for youtube support
1 year ago
Mike Lang 18b71e6f5f downloader: Add youtube support
Adds very simple youtube stream support where we only ever use the "best" quality,
which we call "source" for consistency with twitch.

We use yt-dlp to do the heavy lifting of getting the playlist url out of youtube.
1 year ago
Mike Lang 05349a62df downloader: Prepend any HLS "map" data to every segment
In some formats, most notably DASH, there is a "initialization data" that is required
in order to play the segment. The data is common to all segments so it is served as a seperate URL
under EXT-X-MAP. However, redundant copies of this data are benign and it's very small, so
we just put it in front of EVERY segment so that we can play every one independently (but
concatenating them still works).

We use a very simple cache to avoid downloading it again for every segment.
1 year ago
Mike Lang e4252544be downloader: Add ability to download from an arbitrary playlist URL
This is a very limited feature, it can't handle multiple qualities.
You must use "source" only and it takes the first available rendition.
1 year ago
Mike Lang 7a75442243 downloader: rename twitch.py to providers.py
Since it will now contain non-twitch providers also.
1 year ago
Mike Lang bc08d97e56 downloader: Add framework to allow alternate "providers" besides twitch
This abstracts out the process of obtaining media playlists so that we can support non-twitch
streaming services.
1 year ago
Mike Lang 044dfb8084 Pin argh to avoid stupid breaking changes 1 year ago
Mike Lang d636817b36 downloader: Add optional ability to authenticate when getting master playlist
Authenticating to a particular twitch account can give benefits, most notably not being served ads.
1 year ago
Mike Lang 30d5ccc483 Fix all old references to github.com/ekimekim/wubloader 1 year ago
Mike Lang 1add3c5c22 Implement tombstoning to allow for segment deletion
Rarely, we find ourselves needing to explicitly delete some data, eg. something that shouldn't
have been public and should be removed from all records.

It would also be nice if we could "clean up" bad versions of the same segment,
which occasionally come up when downloaders have issues.

With our distributed segment database, this is actually rather difficult as deleting the data
from any one server would cause it to be restored from the others. It was only possible
by stopping all backfill, deleting the data on all servers, then starting backfill again.

Here we introduce a more practical approach. An operator creates an empty flag file
with the same name as the segment to be deleted, but with a `.tombstone` extension.
eg. to delete a file `/segments/desertbus/source/2019-11-13T02/45:51.608000-2.0-full-7IS92rssMzoSBQDIevHStbTNy-URRV3Vw-jzZ6pwOZM.ts`,
you would create a tombstone `/segments/desertbus/source/2019-11-13T02/45:51.608000-2.0-full-7IS92rssMzoSBQDIevHStbTNy-URRV3Vw-jzZ6pwOZM.tombstone`.

These tombstone files do two important things:
* They hide the segment from being listed, which both means:
  * It can't be restreamed or put into a video
  * It can't be backfilled to other nodes
* The tombstone files themselves do get backfilled to other nodes, so you only need to mark them on one server.

Once the tombstone has propagated to all nodes, the segment file can be deleted independently on each one.

We chose not to have a tombstone automatically trigger a segment deletion for safety reasons.
2 years ago
Mike Lang a47c29fff4 Link images to github repo by adding a LABEL
When pushed, this tells github to associate the ghcr.io repo that was pushed to
with the github repo specified (the owner needs to match).

This does a few things.
Most importantly, this automatically gives github actions credentials to push to these
repositories when run in the context of the wubloader repo.
3 years ago
Mike Lang 62bd6539ea Unpin gevent as that was a workaround for a py2 issue 3 years ago
Mike Lang 21856c68aa Fix all instances of file.write() for py3
In python 3, file.write() may do a partial write and returns the number of characters written.
In order to not lose data, we need to wrap every instance of file.write() with our new
common.writeall() wrapper that loops until the data is actually written.
3 years ago
Mike Lang a56f6859bb more py3 fixes 3 years ago
Mike Lang f2a8007bf7 Fix build dependency issues 3 years ago
Mike Lang 6a98addac8 py3 fixes for downloader 3 years ago
HubbeKing 6d790a1b36 Do a first naive pass for py3 compatibility
Check that open() calls for reading and writing use binary modes
Use alpine version with py3-pip package
Use python3 in Dockerfile CMD
Remove sys.setdefaultencoding() "hack"
Simplify ensure_directory() in common.common package
3 years ago
Mike Lang f0546e2ee3 Pin gevent to 1.5a2 to avoid https://github.com/gevent/gevent/issues/1711 3 years ago
Mike Lang 32138bbd43 downloader: Update to work with twitch's new access token API
Twitch removed their old access token endpoint and now use a GraphQL endpoint.
The old endpoint would just always return 404, which we sadly interpreted as "stream not up".

Thankfully streamlink has already done the reverse engineering work so I was able to
update it to work again fairly easily, it's just a bit more convoluted.
4 years ago
Mike Lang 6d55f01de6 downloader: Fix a bug when we can't find a particular stream quality
The intended behaviour was to log a warning message and retry next time,
but still allow workers to be started for any streams found.

However, due to a missing continue, we fall through to attempting to start a worker
for a non-existent quality which causes a KeyError when looking up
`self.latest_urls[quality]`. This exception means we don't run through the other qualities,
so we never start any other quality.
4 years ago
HubbeKing 86f7823348 Replace calls to gevent.signal() with gevent.signal_handler()
gevent.signal() was removed in gevent 1.5a4, see http://www.gevent.org/api/gevent.signal.html
Removed on Feb 5th, see https://github.com/gevent/gevent/pull/1530
4 years ago
Mike Lang a53786dc2d Add file and make as build dependencies
gevent now requires these to build. I'm not sure when this changed.
4 years ago
Mike Lang 5ccb5afff1 Track number of ignored ads 4 years ago
Mike Lang 50770651ce Detect new style of twitch in-stream ads
New ad segments always have a title like "Amazon|...".
This is the same fix as used by streamlink at time of writing.
4 years ago
Mike Lang 21d5548980 Add new segment type "suspect"
We've noticed that when nodes have connection problems, they get full segments
with different hashes. Inspection of these segments shows that
they all have identical data up to a point.

Segments that fetched normally will then have the remainder of the data.
Segments that had issues will have a slightly corrupted end.
The data is still valid, and no errors are raised. It just doesn't have all the data.

We noticed that these corrupted segments all were cut off exactly 60sec after their requests
began. We believe this is a server-side timeout on the request that returns whatever data
it has, then closes the container file cleanly before returning successfully.

We detect segments that take > 59 seconds to recieve, and label them as "suspect".

Suspect segments are treated identically to partial segments, except they are always preferred
over partials.
5 years ago
Mike Lang 72003f28d0 downloader: Don't check the age of a worker we just spawned
Not only is this redundant, but it creates a race condition where
the worker fails before the latest_worker = workers[-1] check,
and we get an IndexError.
5 years ago
Mike Lang 94d81d708f Downloader: Change access_token call to match website
It stopped working, these changes bring it back in line with the website
so it works.
5 years ago
Christopher Usher abb9193705 fixed outdated "stream", "variant" in metric 5 years ago
Mike Lang 6c6c1ae637 downloader: Make a few things quieter for non-important channels 5 years ago
Mike Lang b2a07ef114
Merge pull request #140 from ekimekim/mike/build-improvements
Refactor dockerfiles for more shared layers
5 years ago
Mike Lang 731ef9e2d0 Refactor dockerfiles for more shared layers
By carefully ensuring most of our dockerfiles are identical in their first few layers,
we only need to build those layers once instead of every time.

In particular, we move installing gevent to before installing common,
so that even when common changes gevent doesn't need to be reinstalled.

This is important because gevent takes ages to install.

Also fixes segment_coverage, which wasn't being installed.
5 years ago
Mike Lang 83750da37b downloader: Create concept of an "important" channel
In our usage, we have one channel where we really care / want to know if it's down,
but also a bunch of other channels where they're expected to not be streaming most/all of the time.

To prevent these extra channels making a ton of noise, we introduce the concept of an "important"
channel, indicated by appending a '!' to the channel name in the command line.

So for example, you might specify channels as "foo! foo_backup foo_behindthescenes".

Important channels have the same behaviour as previously.

Non-important channels:
* Have a 20-second retry on a master playlist fetch failure, instead of 5
* Log at debug when the stream is down, instead of info.
5 years ago
Mike Lang 8ad61e9870 downloader: Collect metrics on http calls 5 years ago
Mike Lang b4655f18c6 downloader: Track total duration of downloaded segments 5 years ago
Mike Lang 04ef0d3823 fix a few remaining usages of StreamWorker.stream instead of .quality 5 years ago
Christopher Usher 361e577474 fixes based on ekimekims suggestions 5 years ago
Christopher Usher 3564643613 refactoring downloader 5 years ago
Mike Lang 73d5941e05 downloader: Track timestamp of latest segment
This gives us a "stream delay" metric.

Prom doesn't have any native way to check the current value of a metric,
in order to take max(). It only offers increment and set.

We reach into some internals to do this in a hacky way,
but the cleaner way would be to track the value ourselves and have a prom callback
that gets the value.

Sigh, I hate this prom library. I might write my own that's less dumb.
5 years ago
Mike Lang f8d10dacdf Audit and fix all usage of dateutil
We wrap direct dateutil calls to handle two distinct cases:

* `common.dateutil.parse()`: We want to handle arbitrary timestamps including tz info,
then convert them to UTC.

This is used in HLS parsing, and for command line input for backfiller

* `common.dateutil.parse_utc_only()`: We want to only handle UTC timestamps,
but datetime.strptime isn't flexible enough (eg. can't handle missing fractional component).

This is used for restreamer request params.
5 years ago
Mike Lang df66553b38 downloader: Start backdoor later so workers is in locals 6 years ago
Mike Lang 86da9d9fe8 downloader: Support watching multiple channels
This is useful eg. for watching db_admin or other testing channels in addition to the main channel.
6 years ago
Mike Lang f0d9aa82c2 Ignore segments that are marked as ads
* Checks for the SCTE35-OUT/SCTE35-IN marks in the HLS stream that indicate an ad start/end
* Ignores those segments completely
* Doesn't mark the StreamWorker as up until it sees the first non-ad segment

Some other operational notes:
* The main risk this adds is that re-connecting / refreshing master playlist takes longer.
  If all downloaders are doing this at the same time (ie. because the stream only just came up,
  or during a deployment rollout), all downloaders might be waiting for ads to finish and
  you'll miss segments.
* We should run more downloaders to compensate. This also increases the chance at least one of
  them won't get any ads, so we get everything right from stream-up.
* The other mitigation we can do is have geographically diverse downloaders. This decreases the risk
  that they all get served an ad, and at least at time of writing it seems that no in-stream ads
  are served outside of these regions:

> US, Canada, Germany, France, Sweden, Belgium, Poland, Norway, Finland, Denmark, Netherlands, Italy, Spain, Switzerland, Austria, Portugal, UK, Australia, New Zealand
6 years ago
Mike Lang c0f94059aa downloader: Stop retrying in SegmentGetter after a long timeout
In resource contention scenarios, all calls can start failing due to
not being able to read the response in a timely manner.
This means SegmentGetters never stop retrying, leading to further contention
and a feedback loop.
We attempt to put at least some cap on this scenario by giving up
if an amount of time has elapsed to the point that we know our URL couldn't be valid anymore.
Since we don't actually know how long segment URLs are valid, we are very conservative about
this time, for now setting it to 20min.
6 years ago
Mike Lang 81aee0ee1e Increase hard timeout for getting segment headers
When we're under CPU or disk contention, doing other work
can become very slow. We want to avoid spurious errors in this situation
as this causes further retries and further contention.

One easy way to do this is to increase the time we have to finish fetching headers.
6 years ago
Mike Lang b75b9a9b00 Add stacksampler to all services 6 years ago
Mike Lang a5213ccb3b downloader: Pool connections when we can
To preserve independence between workers and ensure that a
retry (a worker re-create) actually starts from scratch, we only pool connections
on a per-worker basis.
Furthermore, for the same reason, we only let SegmentGetters use the worker's
pool on their first attempt. After that, they create a new pool to ensure they have a clean retry.

Despite this, the result should be that we're almost always re-using an existing connection
when getting segments or media playlists, unless something goes wrong.

SSL connection setup was measured as almost half the CPU time used by the process,
so this change should result in a signifigant CPU usage reduction.
6 years ago