wubloader

Commit Graph

Author	SHA1	Message	Date
Mike Lang	3a1e4b0aef	restreamer: Fix missing dependency This was hidden because common included it	7 years ago
Mike Lang	997c1242b2	get_best_segments: Let other things run get_best_segments can sometimes take a very long time, we don't want to stop other work from happening while it's ongoing. So we ask gevent to run other things until there's no other work to do, then we do one hour, then check back with gevent again. In combination with the performance improvements, this should mean we don't block other things from running for more than a few hundred ms at most.	7 years ago
Mike Lang	bf08aa29b8	parse_segment_path: Use datetime.strptime instead of dateutil.parser strptime is much faster but can't handle as varied formats. But in this case we fully control the format, so there's no reason not to use it. Profiling suggests we spend about 80% of our time in get_best_segments just parsing dates, so this is a signifigant performance gain.	7 years ago
Mike Lang	bcdb268ce8	Also need to replace locks on the counter float values to prevent deadlocks See comment for full details	7 years ago
Mike Lang	10cca18922	Fix a deadlock due to signal interactions with prometheus client The prometheus client uses a threading.Lock() to prevent shared access to certain metric state. This lock is taken as part of doing collection, as well as during metric.labels(). We hit a deadlock where our stack sampler signal arrived during a collection, when the lock was held. This meant that flamegraph.labels() blocked forever, and the lock was never released, hanging all metrics collection. Our solution is a hack, which is to reach into the internals of our metric object and replace its lock with a dummy one. This is reasonably safe, but only as long as the prometheus_client internal structure doesn't change signfigiantly.	7 years ago
Mike Lang	c9cc8a73a7	generate-flamegraph: Script to create a flamegraph by querying prometheus	7 years ago
Mike Lang	b75b9a9b00	Add stacksampler to all services	7 years ago
Mike Lang	b9c2921242	common.stats: Add a stacksampler that records sampled stacks to prometheus This can then be used to generate flamegraphs	7 years ago
Mike Lang	a5213ccb3b	downloader: Pool connections when we can To preserve independence between workers and ensure that a retry (a worker re-create) actually starts from scratch, we only pool connections on a per-worker basis. Furthermore, for the same reason, we only let SegmentGetters use the worker's pool on their first attempt. After that, they create a new pool to ensure they have a clean retry. Despite this, the result should be that we're almost always re-using an existing connection when getting segments or media playlists, unless something goes wrong. SSL connection setup was measured as almost half the CPU time used by the process, so this change should result in a signifigant CPU usage reduction.	7 years ago
Mike Lang	5175b099af	common: Split segment-related stuff into its own module We still import them into __init__.py so they're accessible externally just the same	7 years ago
Mike Lang	6f84a23ba6	common: Split stats-related stuff into its own module We still import them into __init__.py so they're accessible externally just the same	7 years ago
Mike Lang	8fe2fec958	common: convert from module to package	7 years ago
MasterGunner	96c1566d21	Merge pull request #34 from ekimekim/gunner/restreamer/additional-routes Added additional routes for listing available streams and variants.	7 years ago
MasterGunner	a9569d9e96	Removed unneeded '@has_path_args'.	7 years ago
MasterGunner	306ac53d08	Added additional routes for listing available streams and variants.	7 years ago
Mike Lang	901cda4814	Enable backdoor in all services, and add telnet to containers	7 years ago
Mike Lang	9af7795f34	Add gevent.backdoor as an optional arg to all services Backdoor allows the operator to telnet into the given port, and get a python shell running inside the process, from which you can debug, modify state (eg. set the log level), or whatever. This is extremely useful for debugging weird states that you encounter randomly but can't easily reproduce, without restarting the process and needing to wait until it happens again.	7 years ago
Mike Lang	47ff92b155	downloader: Fix bug where mark_working wasn't called This meant that old workers would never shut down, causing us to fetch the same media playlist and same segments multiple times for no reason, and to never give up in face of (non-403/404) errors even once we have something else working.	7 years ago
Mike Lang	3042d00516	downloader: Give up on 404 in addition to 403 Also fix some logging. When we're out of touch with twitch for long enough, our segment URL will get so old that twitch stops returning 403 because our token is expired, and start returning 404s, presumebly becasue the underlying resource has gone away. We want to treat these the same.	7 years ago
Mike Lang	7f9a1dbe45	downloader: Remove implicit source quality arg This brings it in line with backfiller, is more flexible and less surprising	7 years ago
Mike Lang	89d6b3a6be	docker-compose: Add list of peers to backfill from	7 years ago
Mike Lang	0d627715f3	downloader: Track number of downloaded segments This is the most important metric, we can add more later.	7 years ago
Mike Lang	90ccc6d827	backfiller: Track number of successful backfills Other stats can come later, but this one is important as it tells us if a downloader hasn't been doing its job.	7 years ago
Mike Lang	c59892e148	backfiller: Add ability to set nodes as CLI arg	7 years ago
Mike Lang	bdcb217d20	docker-compose: Expose metrics ports for other services	7 years ago
Mike Lang	b4b315b6bc	Expose prometheus metrics for backfiller and downloader	7 years ago
Mike Lang	d90f01b8ce	common: Create general function for timing things, and use it to time get_best_segments The function is quite customizable and therefore quite complex, but it allows us to easily annotate a function to be timed with labels based on input and output, as well as normalize results based on amount of work done to get a better picture of the actual amount of time taken per unit of work. This will help us monitor for performance issues.	7 years ago
Mike Lang	b0ded641c3	Add a logging handler which counts logs for prometheus stats This isn't as good as having a full centralised logging system, but should suffice to know if anything funny is happening.	7 years ago
Mike Lang	c9d02b3318	restreamer: Prevent prom client blowing up after two different endpoints are hit Prom client doesn't like you creating two stats with the same name, even though they have different labels and this makes perfect sense. I feel like I just need to re-write the prom client at some point - it doesn't actually do all that much except get in your way, apart from the actual text encoding which I can steal. Anyway, in the meantime, we get around this by breaking up metrics into two names, a "foo_all" and a "foo_ENDPOINT". The foo_all lacks the detailed labels, but is still labelled by endpoint and can be used more easily. The foo_ENDPOINT labels have more information but require messier PromQL as you need to match on a name regex if you want to look at more than one specific endpoint.	7 years ago
Mike Lang	30c4bbec1d	restreamer: return the actual response from after_request even if untracked otherwise any untracked endpoints don't work	7 years ago
Christopher Usher	96e6904c85	Added monotonic to restreamer setup.py	7 years ago
Christopher Usher	225288980a	Added the backfiller to docker-compose	7 years ago
Christopher Usher	3fcd374449	Moved encode_strings to common	7 years ago
Christopher Usher	93dd216f89	Fixes and suggestions from ekimekim	7 years ago
Christopher Usher	db1b4e6539	Updated logging to match the other components	7 years ago
Christopher Usher	bae039977b	trying getting the backfiller to actually start	7 years ago
Christopher Usher	1fcd9b5b36	Adding in stuff to hopefully get this to run	7 years ago
Christopher Usher	013ad65c68	added a Dockerfile for the backfiller	7 years ago
Christopher Usher	48d11045d4	Change to backfiller.main to backfill the last 3 hours on start up before doing a full backfill	7 years ago
Christopher Usher	176633bf7d	More messing around with backfill_node to allow finer grained control of order segments are fetched	7 years ago
Christopher Usher	3a7624b107	added a setup file for the backfiller	7 years ago
Christopher Usher	ba499fe835	added more logging to backfiller	7 years ago
Mike Lang	7525b7c135	restreamer: Add basic prometheus stats to all endpoints I had to go to some effort to get nice labelling, which also meant none of the existing libs for this were any good, but this works well enough. Exposes the metrics on /metrics.	7 years ago
Mike Lang	17972b87aa	Allow setting of log level via WUBLOADER_LOG_LEVEL env var By using an env var, it is universal and happens prior to arg parsing, at the same point we do other logging setup.	7 years ago
Mike Lang	c0357680cf	downloader: Use caller's logger inside soft_hard_timeout	7 years ago
Mike Lang	a628676e74	downloader: Log to subloggers instead of the root logger This gives us some context when logging, and is best practice.	7 years ago
Mike Lang	57e665df2e	generate-docker-compose: Clean up the container afterwards I'll never understand why this isn't the default, docker.	7 years ago
Mike Lang	c8cc4a68a0	cutter: Fix bugs that meant things wouldn't actually be cut The calculations were backwards, so instead of cutting a video by, say, 2 seconds, it would cut by -2 seconds, which was clamped to 0. So it would never actually cut, it would always use the closest segment. Also, once we were actually cutting, we hit an issue where ffmpeg would finish and close its input early, because we'd reached the end of the cut video, but not all input had been written yet. This resulted in an EPIPE error (write to closed pipe) in the input feeder. We now ignore that.	7 years ago
Mike Lang	6bf709287a	cutter: Introduce an alternate cutting approach that is much faster This cutter works by only cutting the first and last segments to size, then concatting them with the other segments, so we only ever process a few seconds of video instead of the entire video duration. However, to make this work, care must be taken that the cut segments use the same codecs as the other segments. The reason it's experimental is that we are not yet confident in its ability to cut accurately and without sync issues. We have seen some minor issues when trying to play back the raw output files, but youtube's re-encoding has consistently smoothed out those issues and they seem to be highly player-specific. Vigorous testing is needed. Also note that both methods right now (cat then cut, and cut then cat) only work if all the segments are cattable, that is they all use the same codecs, have the same resolution, etc. If a stream were to change its encoding settings, and we were cutting over that change, both approaches would not work. We should add checks for that scenario (which can only happen over a stream drop), and if so fallback to a slow method using ffmpeg's concat filter, which will work even for disparate codecs, though reconciling mismatched resolutions or frame rates may require further work.	7 years ago
Mike Lang	6815924097	Fix some bugs and linter errors introduced by backfiller I ran `pyflakes` on the repo and found these bugs: ``` ./common/common.py:289: undefined name 'random' ./downloader/downloader/main.py:7: 'random' imported but unused ./backfiller/backfiller/main.py:150: undefined name 'variant' ./backfiller/backfiller/main.py:158: undefined name 'timedelta' ./backfiller/backfiller/main.py:171: undefined name 'sort' ./backfiller/backfiller/main.py:173: undefined name 'sort' ``` (ok, the "imported but unused" one isn't a bug, but the rest are) This fixes those, as well as a further issue I saw with sorting of hours. Iterables are not sortable. As an obvious example, what if your iterable was infinite? As a result, any attempt to sort an iterable that is not already a friendly type like a list or tuple will result in an error. We avoid this by coercing to list, fully realising the iterable and putting it into a form that python will let us sort. It also avoids the nasty side-effect of mutating the list that gets passed into us, which the caller may not expect. Consider this example: ``` >>> my_hours = ["one", "two", "three"] >>> print my_hours ["one", "two", "three"] >>> backfill_node(base_dir, node, stream, variants, hours=my_hours, order='forward') >>> print my_hours ["one", "three", "two"] ``` Also, one of the linter errors was non-trivial to fix - we were trying to get a list of hours (which is an api call for a particular variant), but at a time when we weren't dealing with a single variant. My solution was to get a list of hours for ALL variants, and take the union.	7 years ago

1 2 3

118 Commits (3a1e4b0aef8d029f2b94726a99d5f3f95718a4e4) All Branches Search

118 Commits (3a1e4b0aef8d029f2b94726a99d5f3f95718a4e4)

All Branches