wubloader

Commit Graph

Author	SHA1	Message	Date
Mike Lang	9d8c47377f	segment parsing: Hand-roll microsecond parsing float() is inaccurate and Decimal() is very slow (~3x the cpu usage) so instead we right-pad with 0s (eg. so 1.2345 -> 1.234500) then convert to int microsec directly.	5 years ago
Mike Lang	66669cd4e4	common: When parsing segment timestamps, use decimal instead of float Floating point error leads to 1us differences in parsed times, which causes false positives in the overlapping segments check. By using a Decimal, we get the exact digits from the filepath.	5 years ago
Mike Lang	13a228070a	common.segments: Speed up segment parsing by rolling our own time parsing strptime is very slow. In terms of pure get_best_segments() speed, this change more than doubles the throughput. In particular for segment_coverage, this halves the run time for each check.	5 years ago
Mike Lang	1b12c05e0e	make smart cut work, only to discover it doesn't actually have any advantage over fast	5 years ago
Mike Lang	7dcd844e16	add logging to help debug smart cut	6 years ago
Mike Lang	c294fa82b8	smart cut: Fix output format	6 years ago
Mike Lang	c6172ce37f	smart cut: More typos	6 years ago
Mike Lang	82346a55ca	smart cut: Fix int in ffmpeg args	6 years ago
Mike Lang	21d5548980	Add new segment type "suspect" We've noticed that when nodes have connection problems, they get full segments with different hashes. Inspection of these segments shows that they all have identical data up to a point. Segments that fetched normally will then have the remainder of the data. Segments that had issues will have a slightly corrupted end. The data is still valid, and no errors are raised. It just doesn't have all the data. We noticed that these corrupted segments all were cut off exactly 60sec after their requests began. We believe this is a server-side timeout on the request that returns whatever data it has, then closes the container file cleanly before returning successfully. We detect segments that take > 59 seconds to recieve, and label them as "suspect". Suspect segments are treated identically to partial segments, except they are always preferred over partials.	6 years ago
Mike Lang	bb05e37ae4	segments: Use longest segment in bytes if duration is the same We occasionally see corrupted segments that are slightly shorter in size but report the same metadata as the full segments. Prefer the largest version as it's likely the least corrupt.	6 years ago
Mike Lang	b516917e62	Add new "smart" cut technique	6 years ago
Mike Lang	59ee5cf5c0	Only log at INFO about multiple versions of a segment Since these tend to happen around stream endings, etc, we don't want them to be crazy noisy and cause us to disregard real problems. We can use the segment coverage to see in metrics if there are overlaps.	6 years ago
Mike Lang	249e32583b	get_best_segments: Don't error if the only segments that exist for time are temp	6 years ago
Mike Lang	a7f5d1c545	Fix issues with metrics gathering for cut functions * Need to allow timed() to have multiple callers with same name * "type" label is reserved, use "cut_type" instead	6 years ago
Mike Lang	eb4fb5a9e1	restreamer: Add more options for fetching cuts Split full cut into two types - an mpegts one and an mp4 one. Add "rough" cut which is just a concat of the segments.	6 years ago
Mike Lang	b27e06d068	Fix typo in common/common/segments.py	6 years ago
Mike Lang	4d52b18b04	cutter,restreamer: Set stream=True for full cuts when appropriate And also default to a new ffmpeg encoding setting for high-quality mpegts (ie. still streamable) that is encoded very quickly.	6 years ago
Mike Lang	9afcc7b399	full cut: Optionally use seekable file OR directly stream The caller can pick depending on the needs of the output format. This reverses most of `80d829b83b`, re-introducing streaming full cuts but keeping non-streaming as an option.	6 years ago
Mike Lang	4f900c5925	Collect metrics around cutting time	6 years ago
Mike Lang	e435abf72e	Merge pull request #114 from ekimekim/mike/fixes Grab-bag of cutter fixes	6 years ago
Mike Lang	80d829b83b	full cut: ffmpeg requires a seekable output file Most formats like mp4 require ffmpeg to make changes at the start of the file throughout writing. Unfortunately, this prevents us from streaming the upload as we cut it. Instead, we spool to a temporary file until ffmpeg exits, then upload that all at once.	6 years ago
Christopher Usher	f4cd3f546e	removed comments no longer needed	6 years ago
Christopher Usher	51e4520826	replaced warnings.warn with logger.warn	6 years ago
Christopher Usher	7d85eb7272	warn about and ignore files that don't parse as segments	6 years ago
Mike Lang	3a9543a4b5	Suppress less ffmpeg output when cutting The "fatal" level was causing some useful errors to be suppressed.	6 years ago
Mike Lang	12decf015e	Fix multiple typos and mistakes with full cuts	6 years ago
Mike Lang	c970677a76	full cut: Fix a typo	6 years ago
Mike Lang	d3e1d6b4fc	Resurrect non-experimental cut, now dubbed "full" (vs "fast") cut In a fast cut, we edit the first and last segments then concatenate them all. However, this leads to some tiny but perciptible artifacting around the border of the first and second (and second-last and last) segments. A full cut is much slower, but re-encodes the video into the desired format and is more reliable. We want both options to be available. With this commit, we only add the option, we don't use it in restreamer or cutter.	6 years ago
Christopher Usher	a6303c38ce	fixed parse_segment_path to allow just a filename to be parsed	6 years ago
Christopher Usher	b959853593	refactored to channel and quality	6 years ago
Mike Lang	7179fcacec	Backfiller: ignore temp segments To make this work, we make type a proper segment field. We also tell get_best_segments to ignore temp segments, since they might go away before we can actually use them.	6 years ago
Mike Lang	cca4d52b7d	Don't error when encountering a temp-type segment These can happen if a downloader or backfiller dies suddenly. We treat it similarly to partial but lacking any hash. At some point in the future we should probably have something to find any temp segments, hash them and rename them to partials.	6 years ago
Mike Lang	dfc64481a6	Port existing cutting code from restreamer into common Note this moves over the 'experimental' cutter and deletes the original cutter that concatenates entire videos before cutting. We may eventually want to revive that method if the experimental cutter turns out to introduce too many issues. We move most of the code over verbatim, but adjust it such that it acts as a generic iterator that can be used in a variety of contexts. Some other changes made during the move include telling ffmpeg to be quieter (don't output version info and junk, only log if something goes wrong), and avoiding errors during cleanup.	6 years ago
Mike Lang	3d9ba77745	common: add allow_holes option to get_best_segments() to abort early if holes found This is a performance optimization, allowing us to fail out early (potentially avoiding a LOT of work) if we know we're going to reject any result that contains holes. We add a new exception ContainsHoles that is raised in this condition.	6 years ago
Mike Lang	997c1242b2	get_best_segments: Let other things run get_best_segments can sometimes take a very long time, we don't want to stop other work from happening while it's ongoing. So we ask gevent to run other things until there's no other work to do, then we do one hour, then check back with gevent again. In combination with the performance improvements, this should mean we don't block other things from running for more than a few hundred ms at most.	7 years ago
Mike Lang	bf08aa29b8	parse_segment_path: Use datetime.strptime instead of dateutil.parser strptime is much faster but can't handle as varied formats. But in this case we fully control the format, so there's no reason not to use it. Profiling suggests we spend about 80% of our time in get_best_segments just parsing dates, so this is a signifigant performance gain.	7 years ago
Mike Lang	5175b099af	common: Split segment-related stuff into its own module We still import them into __init__.py so they're accessible externally just the same	7 years ago

37 Commits (74a38bfaeb349cde7d4b79cf9ee7c68871256ada)