wubloader

Commit Graph

Author	SHA1	Message	Date
Mike Lang	c50224415c	more backfiller chat fixes fixup: more backfiller fixes Enable backfilling of chat logs	2 years ago
Mike Lang	f8b3ace148	Backfill chat archives under the "chat" quality	2 years ago
Mike Lang	1add3c5c22	Implement tombstoning to allow for segment deletion Rarely, we find ourselves needing to explicitly delete some data, eg. something that shouldn't have been public and should be removed from all records. It would also be nice if we could "clean up" bad versions of the same segment, which occasionally come up when downloaders have issues. With our distributed segment database, this is actually rather difficult as deleting the data from any one server would cause it to be restored from the others. It was only possible by stopping all backfill, deleting the data on all servers, then starting backfill again. Here we introduce a more practical approach. An operator creates an empty flag file with the same name as the segment to be deleted, but with a `.tombstone` extension. eg. to delete a file `/segments/desertbus/source/2019-11-13T02/45:51.608000-2.0-full-7IS92rssMzoSBQDIevHStbTNy-URRV3Vw-jzZ6pwOZM.ts`, you would create a tombstone `/segments/desertbus/source/2019-11-13T02/45:51.608000-2.0-full-7IS92rssMzoSBQDIevHStbTNy-URRV3Vw-jzZ6pwOZM.tombstone`. These tombstone files do two important things: * They hide the segment from being listed, which both means: * It can't be restreamed or put into a video * It can't be backfilled to other nodes * The tombstone files themselves do get backfilled to other nodes, so you only need to mark them on one server. Once the tombstone has propagated to all nodes, the segment file can be deleted independently on each one. We chose not to have a tombstone automatically trigger a segment deletion for safety reasons.	2 years ago
Mike Lang	a47c29fff4	Link images to github repo by adding a LABEL When pushed, this tells github to associate the ghcr.io repo that was pushed to with the github repo specified (the owner needs to match). This does a few things. Most importantly, this automatically gives github actions credentials to push to these repositories when run in the context of the wubloader repo.	3 years ago
Mike Lang	62bd6539ea	Unpin gevent as that was a workaround for a py2 issue	3 years ago
Mike Lang	e63aa53019	Remove left-over usage of encode_strings More py3 breakage	3 years ago
Mike Lang	21856c68aa	Fix all instances of file.write() for py3 In python 3, file.write() may do a partial write and returns the number of characters written. In order to not lose data, we need to wrap every instance of file.write() with our new common.writeall() wrapper that loops until the data is actually written.	3 years ago
Mike Lang	a56f6859bb	more py3 fixes	3 years ago
Mike Lang	f2a8007bf7	Fix build dependency issues	3 years ago
Mike Lang	7d4eb3c8db	py3 fixes for backfiller	3 years ago
HubbeKing	6d790a1b36	Do a first naive pass for py3 compatibility Check that open() calls for reading and writing use binary modes Use alpine version with py3-pip package Use python3 in Dockerfile CMD Remove sys.setdefaultencoding() "hack" Simplify ensure_directory() in common.common package	3 years ago
Mike Lang	f0546e2ee3	Pin gevent to 1.5a2 to avoid https://github.com/gevent/gevent/issues/1711	3 years ago
Mike Lang	b53fcd65a0	Add dependencies required to install psycopg2 from source We can't install the binaries as they don't support musl	4 years ago
HubbeKing	86f7823348	Replace calls to gevent.signal() with gevent.signal_handler() gevent.signal() was removed in gevent 1.5a4, see http://www.gevent.org/api/gevent.signal.html Removed on Feb 5th, see https://github.com/gevent/gevent/pull/1530	4 years ago
Mike Lang	a53786dc2d	Add file and make as build dependencies gevent now requires these to build. I'm not sure when this changed.	4 years ago
Christopher Usher	986c9a3413	removed redundant option	5 years ago
Christopher Usher	9c77dd1f40	added the ability to generate a webpage with all coverage maps	5 years ago
Mike Lang	71333cf826	backfiller: Only run one manager, not one per channel Then treat backfilling each channel just like backfilling each quality. This is conceptually simpler (only one kind of thing, a (channel, quality)) and has better behaviour when a node is down (we only have one lot of error handling around it). It also means we aren't asking the database for the same info once per channel, and cuts down on logging noise.	5 years ago
Mike Lang	ff18c7df54	backfiller: Fix issue with tracking metrics after get_nodes() failure There isn't an easy, clean way to pass in the DB hostname there, and neither label is very valuable. Let's just drop it entirely.	5 years ago
Mike Lang	731ef9e2d0	Refactor dockerfiles for more shared layers By carefully ensuring most of our dockerfiles are identical in their first few layers, we only need to build those layers once instead of every time. In particular, we move installing gevent to before installing common, so that even when common changes gevent doesn't need to be reinstalled. This is important because gevent takes ages to install. Also fixes segment_coverage, which wasn't being installed.	5 years ago
Mike Lang	d63ae573b7	backfiller: Collect metrics on http calls	5 years ago
Christopher Usher	dc1c31fef4	fixes to the error handling as suggested by ekim	5 years ago
Christopher Usher	553b11bc84	typo in error handler	5 years ago
Christopher Usher	9dfffe0a62	improvements based on ekims suggestions plus delete_hours yeilds to the rest of the backfiller	5 years ago
Christopher Usher	8579fcaeea	check start is not None before checking whether hour is before start	5 years ago
Christopher Usher	f8835cd253	documentation for --delete-old; check that start is not none	5 years ago
Christopher Usher	a361249145	renamed delete_before to keep_hours	5 years ago
Christopher Usher	be562b8448	added warnings if keeping fewer hours than backfilling	5 years ago
Christopher Usher	9d81569d98	Added the ability to delete old hours	5 years ago
Mike Lang	7183b25ce9	Merge pull request #119 from ekimekim/mike/database-resilience Changes to improve behaviour if the DB is down	5 years ago
Christopher Usher	929308f3e7	started on the segment_coverage service	5 years ago
Mike Lang	0e437566aa	backfiller: Don't crash on DB errors We move all connection handling into get_nodes(). This means that problems connecting won't cause further errors and cause the application to completely crash. In turn, this means that the behaviour if the database goes down becomes "continue backfilling from the nodes we know about" instead of crashing.	5 years ago
Christopher Usher	ccb7f3c684	now use parse_segment_path to get hash from filename	5 years ago
Christopher Usher	36da1926d0	fixes for ekims suggestions	5 years ago
Christopher Usher	34785d1179	now checking the hashes	5 years ago
Christopher Usher	b0562495d2	reject mismatched hashes; more metrics	5 years ago
Christopher Usher	120a5a7de0	started on checking the hash	5 years ago
Chris Usher	557ddddc31	better logging for the backfiller	5 years ago
Christopher Usher	44f0e0defb	changed it back so only the name is checked	5 years ago
Chris Usher	aab46f9765	fixed localhost bug in backfiller	5 years ago
Christopher Usher	84270c02ec	logging fix	5 years ago
Christopher Usher	fdb5d20db7	fix to database logging	5 years ago
Christopher Usher	497845f2da	typos in comments	5 years ago
Christopher Usher	361e577474	fixes based on ekimekims suggestions	5 years ago
Christopher Usher	720684a388	refactoring to have consistent terminology	5 years ago
Christopher Usher	6d38250674	starting to refactor stream to channel and variant to quality	5 years ago
Mike Lang	f50276bd01	backfiller: Expose recent_cutoff as CLI arg and increase it to 120s default In testing, GDQ's stream delay went up over 1min, which caused backfillers to backfill segments at the same time they were downloaded. We increase the window for now, and also make it configurable.	5 years ago
Mike Lang	29040a166c	backfiller: Allow multiple concurrent segment downloads This will signifigantly increase throughput when downloading large ranges of segments. The max concurrency is exposed as a cli arg. We also slightly modify the logged info, so it reports segments downloaded, not just number of missing segments (which we might skip downloading for various reasons).	6 years ago
Christopher Usher	37bad7d5ed	Also reset database connection on error in the backfiller	6 years ago
Mike Lang	7179fcacec	Backfiller: ignore temp segments To make this work, we make type a proper segment field. We also tell get_best_segments to ignore temp segments, since they might go away before we can actually use them.	6 years ago

1 2 3

111 Commits (2939089edd07b735f59234db13a16cddc491610e)