wubloader

Commit Graph

Author	SHA1	Message	Date
Mike Lang	ba36338db4	Add simpler wrapper for ffmpeg_cut_segments() for single-input case Also change ffmpeg_cut_many() arg order so common cases can have a default value.	12 months ago
Mike Lang	e65145bcad	Replace ffmpeg_cut_stdin() with ffmpeg_cut_many() This is a more featureful wrapper around ffmpeg with notable differences: - It's used as a context manager, and so can manage its own cleanup - It takes care of input feeding - It can handle multiple inputs (via pipes), instead of one (via stdin) This drastically reduces the setup and cleanup code required for most basic usage, and the multi-input support will be used in followup changes.	12 months ago
Mike Lang	d571bbe81e	ffmpeg_cut_stdin: Remove cut_start and duration built-in args Of 4 users of this function, all but one set them to None. We're about to replace that one usage with something else, so it makes more sense to not have them as options at all and just have the user add to the encode args manually.	12 months ago
ZeldaZach	8bbc72184c	Support hot reload of Zulip Schedule - Move sheets API into common dir, since multi use - Live download from Google Sheets using Config - Falls back on old schedule if new one can't be downloaded for some reason	1 year ago
Mike Lang	264545eb9d	CachedIterator: Fix bug where state can change while taking the lock Resulting in a case where we grab the wrong result, or even try to get the next item after the iterator has already been discarded.	1 year ago
Mike Lang	1857a998c9	reduce overhead of gevent.idle() by only yielding once per 1000 segments	1 year ago
Mike Lang	8ede4622ca	CachedIterator: Re-serve any errors encountered while iterating instead of the second one to reach the error treating it as a successful end of iterator.	1 year ago
Mike Lang	2a1f7207a8	Allow a fudge factor when checking for gaps/overlaps between segments Sometimes in the wild (particularly on youtube) segments may not be timed perfectly, so allow up to 10ms of gap or overlap to be counted as "equal" for purposes of finding the best segment.	2 years ago
Mike Lang	c65eb2eae3	Add a default timeout on google APIs	2 years ago
Mike Lang	cd4d08adc1	Yield after each segment when doing fast/smart cuts To avoid blocking for long periods	2 years ago
Mike Lang	e689626815	Add a small time range around the timestamp when extracting a frame This should hopefully result in frames on the edge of timestamps being extracted from a combination of the neighboring segment and the naive one, so that we don't get errors extracting a frame.	2 years ago
Mike Lang	5a0704d3d7	Reject bustimes with negative minutes	2 years ago
Mike Lang	30f05b0656	thumbnails: Add a CLI for generating them directly	2 years ago
Mike Lang	80c9be0baf	cutter: Get archive cut working	2 years ago
Mike Lang	5e7904dab3	wip: archive cut	2 years ago
Mike Lang	3ea0532838	wip:	2 years ago
Mike Lang	c0e5f32459	Fix bad normalize function for fast_cut_range	2 years ago
Mike Lang	76c9208be5	Move chat_archiver atomic_write() to common for re-use	2 years ago
Mike Lang	c5c8b3997b	change how timestamps work again, so PCR and PTS are both set to start time	2 years ago
Mike Lang	58b4541306	Implement smart cuts	2 years ago
Mike Lang	fa1603e99a	fixts: Only use PCR to set offset, add 33ms to end time	2 years ago
Mike Lang	eaf3ed2e54	fixts first attempt	2 years ago
Mike Lang	c493869b9a	Have list_segment_files also list chat archives Otherwise backfilling of chat doesn't work	3 years ago
Mike Lang	a3e16a2686	thumbnails: Take crop/scaling info from a json file next to the image file	3 years ago
Mike Lang	45c46df8bb	Add thumbnail templating code	3 years ago
Mike Lang	08257386e2	Add restreamer endpoint for viewing chat messages	3 years ago
Mike Lang	1add3c5c22	Implement tombstoning to allow for segment deletion Rarely, we find ourselves needing to explicitly delete some data, eg. something that shouldn't have been public and should be removed from all records. It would also be nice if we could "clean up" bad versions of the same segment, which occasionally come up when downloaders have issues. With our distributed segment database, this is actually rather difficult as deleting the data from any one server would cause it to be restored from the others. It was only possible by stopping all backfill, deleting the data on all servers, then starting backfill again. Here we introduce a more practical approach. An operator creates an empty flag file with the same name as the segment to be deleted, but with a `.tombstone` extension. eg. to delete a file `/segments/desertbus/source/2019-11-13T02/45:51.608000-2.0-full-7IS92rssMzoSBQDIevHStbTNy-URRV3Vw-jzZ6pwOZM.ts`, you would create a tombstone `/segments/desertbus/source/2019-11-13T02/45:51.608000-2.0-full-7IS92rssMzoSBQDIevHStbTNy-URRV3Vw-jzZ6pwOZM.tombstone`. These tombstone files do two important things: * They hide the segment from being listed, which both means: * It can't be restreamed or put into a video * It can't be backfilled to other nodes * The tombstone files themselves do get backfilled to other nodes, so you only need to mark them on one server. Once the tombstone has propagated to all nodes, the segment file can be deleted independently on each one. We chose not to have a tombstone automatically trigger a segment deletion for safety reasons.	3 years ago
Mike Lang	44d0c0269a	cache results of common.segments.best_segments_by_start The restreamer spends most of its time iterating through segments (parsing them, determining the best one for each start time) to serve large time ranges. Since this only depends on the list of filenames read from disk, we can cache it for a given hour as long as that list is identical. This is a little trickier than it sounds because best_segments_by_start is an iterator and in most cases it won't be fully consumed. So we introduce a `CachedIterator` abstraction that will both remember the previously yielded values, and keep track of the live iterator so it can be resumed again if a previous invocation only partially consumed it. This also has the nice side effect of merging simultaneous operations - if two requests come in for the same hour at the same time, they'll share one iterator and both consume the results as they come in.	3 years ago
Mike Lang	9f9ef66a85	Add endpoint to get a given frame of video	3 years ago
Mike Lang	d1ba4bc4eb	Downgrade overlapping segments from warning to info They were causing too much log noise	4 years ago
Mike Lang	7649a4e840	Improve WSGIServer graceful shutdown handling Previously both restreamer and thrimshim had some complex logic for dealing with graceful shutdown, in different ways, that was still prone to race conditions. We replace this with a common method that does it properly. Fixes #226	4 years ago
Mike Lang	aab8cf2f0f	Set up plumbing for multi-range videos and implement no-transition fast cut videos only This is the simplest case as we can just cut each range like we already do, then concat the results. We still allow for the full design in the database and cutter, but error out if transitions is ever anything but hard cuts or if it's a full cut. We also update the restreamer to allow accepting ranges, however for usability we still allow the old "just one start and end" args. Note this changes the thrimshim API to give and take the new "video_ranges" and "video_transitions" columns.	4 years ago
Mike Lang	3de44d6731	Add ability to render waveforms in restreamer	4 years ago
Mike Lang	7599681b6d	yet another py3 map() issue "hey i know lets make everything return an iterable but not update anything else to accept them"	4 years ago
Mike Lang	62bd6539ea	Unpin gevent as that was a workaround for a py2 issue	4 years ago
Mike Lang	21856c68aa	Fix all instances of file.write() for py3 In python 3, file.write() may do a partial write and returns the number of characters written. In order to not lose data, we need to wrap every instance of file.write() with our new common.writeall() wrapper that loops until the data is actually written.	4 years ago
Mike Lang	a56f6859bb	more py3 fixes	4 years ago
Mike Lang	3e69000058	py3 fixes for common	4 years ago
Mike Lang	d03ae49eec	Remove defunct "smart cut" method This was an alternate way of doing a cut that turned out to work exactly the same as a fast cut, just with a more complex implementation.	4 years ago
HubbeKing	6d790a1b36	Do a first naive pass for py3 compatibility Check that open() calls for reading and writing use binary modes Use alpine version with py3-pip package Use python3 in Dockerfile CMD Remove sys.setdefaultencoding() "hack" Simplify ensure_directory() in common.common package	4 years ago
Mike Lang	f0546e2ee3	Pin gevent to 1.5a2 to avoid https://github.com/gevent/gevent/issues/1711	4 years ago
Mike Lang	9d8c47377f	segment parsing: Hand-roll microsecond parsing float() is inaccurate and Decimal() is very slow (~3x the cpu usage) so instead we right-pad with 0s (eg. so 1.2345 -> 1.234500) then convert to int microsec directly.	5 years ago
Mike Lang	66669cd4e4	common: When parsing segment timestamps, use decimal instead of float Floating point error leads to 1us differences in parsed times, which causes false positives in the overlapping segments check. By using a Decimal, we get the exact digits from the filepath.	5 years ago
Mike Lang	13a228070a	common.segments: Speed up segment parsing by rolling our own time parsing strptime is very slow. In terms of pure get_best_segments() speed, this change more than doubles the throughput. In particular for segment_coverage, this halves the run time for each check.	5 years ago
Mike Lang	b029250c1c	Disable stacksampler by default It causes problems due to the sheer number of unique metrics emitted, which makes the prometheus endpoint be very expensive / fail a lot. The data is not useful enough to justify the cost.	5 years ago
Mike Lang	1b12c05e0e	make smart cut work, only to discover it doesn't actually have any advantage over fast	5 years ago
Mike Lang	2dbd1132fe	common.googleapis: Fix a bug in retrying failed access token get Seems that this was never fixed when the code was moved.	6 years ago
Mike Lang	7dcd844e16	add logging to help debug smart cut	6 years ago
Mike Lang	c294fa82b8	smart cut: Fix output format	6 years ago
Mike Lang	c6172ce37f	smart cut: More typos	6 years ago

1 2 3

131 Commits (ba36338db418e39f317932e1d4d67ef8c0f4f4a6)