float() is inaccurate and Decimal() is very slow (~3x the cpu usage)
so instead we right-pad with 0s (eg. so 1.2345 -> 1.234500) then convert to int microsec directly.
Floating point error leads to 1us differences in parsed times,
which causes false positives in the overlapping segments check.
By using a Decimal, we get the exact digits from the filepath.
strptime is very slow. In terms of pure get_best_segments() speed, this change
more than doubles the throughput.
In particular for segment_coverage, this halves the run time for each check.
It causes problems due to the sheer number of unique metrics emitted, which makes
the prometheus endpoint be very expensive / fail a lot.
The data is not useful enough to justify the cost.
We've noticed that when nodes have connection problems, they get full segments
with different hashes. Inspection of these segments shows that
they all have identical data up to a point.
Segments that fetched normally will then have the remainder of the data.
Segments that had issues will have a slightly corrupted end.
The data is still valid, and no errors are raised. It just doesn't have all the data.
We noticed that these corrupted segments all were cut off exactly 60sec after their requests
began. We believe this is a server-side timeout on the request that returns whatever data
it has, then closes the container file cleanly before returning successfully.
We detect segments that take > 59 seconds to recieve, and label them as "suspect".
Suspect segments are treated identically to partial segments, except they are always preferred
over partials.
We occasionally see corrupted segments that are slightly shorter in size
but report the same metadata as the full segments. Prefer the largest version
as it's likely the least corrupt.
Despite our best efforts, this was causing chunked responses to be fully
buffered into memory as a side effect.
This is really bad because responses can be VERY large.
Since these tend to happen around stream endings, etc,
we don't want them to be crazy noisy and cause us to disregard real problems.
We can use the segment coverage to see in metrics if there are overlaps.
This gives an easy way to do so across all services without adding new options.
Reasons to do so might be to avoid overheads or because your prometheus metrics grow too large.
To be clear, this is an awful hack.
It means that any implicit str/unicode coersion will use the utf-8 encoding,
which is basically always what you want.
However, it is possible that some badly-written libraries might be relying
on the default encoding being ascii, and will do weird things as a result.
Finally, it's especially hacky to be doing this as part of importing a library.
Normally you're meant to do this as part of a sitecustomize.py in your python system directory,
and the function is deleted before passing control to normal code (this is why we need
to reload() to get it back).
The caller can pick depending on the needs of the output format.
This reverses most of 80d829b83b,
re-introducing streaming full cuts but keeping non-streaming as an option.
Most formats like mp4 require ffmpeg to make changes at the start of the file
throughout writing.
Unfortunately, this prevents us from streaming the upload as we cut it.
Instead, we spool to a temporary file until ffmpeg exits,
then upload that all at once.
In a fast cut, we edit the first and last segments then concatenate them all.
However, this leads to some tiny but perciptible artifacting around the border
of the first and second (and second-last and last) segments.
A full cut is much slower, but re-encodes the video into the desired format
and is more reliable.
We want both options to be available.
With this commit, we only add the option, we don't use it in restreamer or cutter.