Commit Graph

90 Commits (fe4299e926b02c891a58f75be74dc15c7e09582f)

Author SHA1 Message Date
Mike Lang 9d8c47377f segment parsing: Hand-roll microsecond parsing
float() is inaccurate and Decimal() is very slow (~3x the cpu usage)
so instead we right-pad with 0s (eg. so 1.2345 -> 1.234500) then convert to int microsec directly.
4 years ago
Mike Lang 66669cd4e4 common: When parsing segment timestamps, use decimal instead of float
Floating point error leads to 1us differences in parsed times,
which causes false positives in the overlapping segments check.

By using a Decimal, we get the exact digits from the filepath.
4 years ago
Mike Lang 13a228070a common.segments: Speed up segment parsing by rolling our own time parsing
strptime is very slow. In terms of pure get_best_segments() speed, this change
more than doubles the throughput.

In particular for segment_coverage, this halves the run time for each check.
4 years ago
Mike Lang b029250c1c Disable stacksampler by default
It causes problems due to the sheer number of unique metrics emitted, which makes
the prometheus endpoint be very expensive / fail a lot.

The data is not useful enough to justify the cost.
4 years ago
Mike Lang 1b12c05e0e make smart cut work, only to discover it doesn't actually have any advantage over fast 4 years ago
Mike Lang 2dbd1132fe common.googleapis: Fix a bug in retrying failed access token get
Seems that this was never fixed when the code was moved.
5 years ago
Mike Lang 7dcd844e16 add logging to help debug smart cut 5 years ago
Mike Lang c294fa82b8 smart cut: Fix output format 5 years ago
Mike Lang c6172ce37f smart cut: More typos 5 years ago
Mike Lang 82346a55ca smart cut: Fix int in ffmpeg args 5 years ago
Mike Lang b39e844c1e restreamer: Fix missing import of smart cut 5 years ago
Mike Lang 21d5548980 Add new segment type "suspect"
We've noticed that when nodes have connection problems, they get full segments
with different hashes. Inspection of these segments shows that
they all have identical data up to a point.

Segments that fetched normally will then have the remainder of the data.
Segments that had issues will have a slightly corrupted end.
The data is still valid, and no errors are raised. It just doesn't have all the data.

We noticed that these corrupted segments all were cut off exactly 60sec after their requests
began. We believe this is a server-side timeout on the request that returns whatever data
it has, then closes the container file cleanly before returning successfully.

We detect segments that take > 59 seconds to recieve, and label them as "suspect".

Suspect segments are treated identically to partial segments, except they are always preferred
over partials.
5 years ago
Mike Lang bb05e37ae4 segments: Use longest segment in bytes if duration is the same
We occasionally see corrupted segments that are slightly shorter in size
but report the same metadata as the full segments. Prefer the largest version
as it's likely the least corrupt.
5 years ago
Mike Lang b516917e62 Add new "smart" cut technique 5 years ago
Mike Lang eba5fc498a Remove flask response size tracking
Despite our best efforts, this was causing chunked responses to be fully
buffered into memory as a side effect.

This is really bad because responses can be VERY large.
5 years ago
Mike Lang 2efe1d6218 Fix a bad logging line when handling errors 5 years ago
Mike Lang 59ee5cf5c0 Only log at INFO about multiple versions of a segment
Since these tend to happen around stream endings, etc,
we don't want them to be crazy noisy and cause us to disregard real problems.

We can use the segment coverage to see in metrics if there are overlaps.
5 years ago
Mike Lang 249e32583b get_best_segments: Don't error if the only segments that exist for time are temp 5 years ago
Mike Lang 6b602592f5 Allow disabling of stacksampling with an env var
This gives an easy way to do so across all services without adding new options.

Reasons to do so might be to avoid overheads or because your prometheus metrics grow too large.
5 years ago
Mike Lang 4d3aa94a71 Automatically set default encoding to utf-8 when common is imported
To be clear, this is an awful hack.

It means that any implicit str/unicode coersion will use the utf-8 encoding,
which is basically always what you want.

However, it is possible that some badly-written libraries might be relying
on the default encoding being ascii, and will do weird things as a result.

Finally, it's especially hacky to be doing this as part of importing a library.
Normally you're meant to do this as part of a sitecustomize.py in your python system directory,
and the function is deleted before passing control to normal code (this is why we need
to reload() to get it back).
5 years ago
Mike Lang 4d5157cdb5 Fix a mistake with allowing reuse of name in @timed() 5 years ago
Mike Lang 426b1328be Fix mistakes in common.requests 5 years ago
Mike Lang a7f5d1c545 Fix issues with metrics gathering for cut functions
* Need to allow timed() to have multiple callers with same name
* "type" label is reserved, use "cut_type" instead
5 years ago
Mike Lang eb4fb5a9e1 restreamer: Add more options for fetching cuts
Split full cut into two types - an mpegts one and an mp4 one.
Add "rough" cut which is just a concat of the segments.
5 years ago
Mike Lang b27e06d068 Fix typo in common/common/segments.py 5 years ago
Mike Lang 4d52b18b04 cutter,restreamer: Set stream=True for full cuts when appropriate
And also default to a new ffmpeg encoding setting for high-quality mpegts
(ie. still streamable) that is encoded very quickly.
5 years ago
Mike Lang 9afcc7b399 full cut: Optionally use seekable file OR directly stream
The caller can pick depending on the needs of the output format.

This reverses most of 80d829b83b,
re-introducing streaming full cuts but keeping non-streaming as an option.
5 years ago
Mike Lang 4f900c5925 Collect metrics around cutting time 5 years ago
Mike Lang 52e6c4ad41 sheetsync, cutter: Collect metrics on http calls
In particular, to google apis.
5 years ago
Mike Lang a2edb38a85 Add an InstrumentedSession wrapper that automatically gathers metrics on http calls 5 years ago
Mike Lang fc791e03d4 DBManager: Don't test connection on start
This gives the individual services more freedom in how to handle
a failing connection.
5 years ago
Mike Lang e435abf72e
Merge pull request #114 from ekimekim/mike/fixes
Grab-bag of cutter fixes
5 years ago
Mike Lang 80d829b83b full cut: ffmpeg requires a seekable output file
Most formats like mp4 require ffmpeg to make changes at the start of the file
throughout writing.

Unfortunately, this prevents us from streaming the upload as we cut it.

Instead, we spool to a temporary file until ffmpeg exits,
then upload that all at once.
5 years ago
Christopher Usher f4cd3f546e removed comments no longer needed 5 years ago
Christopher Usher 51e4520826 replaced warnings.warn with logger.warn 5 years ago
Christopher Usher 7d85eb7272 warn about and ignore files that don't parse as segments 5 years ago
Mike Lang 3a9543a4b5 Suppress less ffmpeg output when cutting
The "fatal" level was causing some useful errors to be suppressed.
5 years ago
Mike Lang 12decf015e Fix multiple typos and mistakes with full cuts 5 years ago
Mike Lang c970677a76
full cut: Fix a typo 5 years ago
Mike Lang d3e1d6b4fc Resurrect non-experimental cut, now dubbed "full" (vs "fast") cut
In a fast cut, we edit the first and last segments then concatenate them all.
However, this leads to some tiny but perciptible artifacting around the border
of the first and second (and second-last and last) segments.

A full cut is much slower, but re-encodes the video into the desired format
and is more reliable.

We want both options to be available.

With this commit, we only add the option, we don't use it in restreamer or cutter.
5 years ago
Christopher Usher 928f9733d2 horrible bug with negative times fixed 5 years ago
Christopher Usher a6303c38ce fixed parse_segment_path to allow just a filename to be parsed 5 years ago
Christopher Usher 632c5fae2f added a default timeout to database connections 5 years ago
Christopher Usher ff5c1f8ecd fixes based on ekim's suggestions 5 years ago
Christopher Usher f75f3e61e8 Removed schema from common/database.py 5 years ago
Christopher Usher 86477fae13 fixes for ekim's comments 5 years ago
Christopher Usher 23e3cfce20 Added editor, edit_time and upload_time to thrimshim and cutter updates of the
database
5 years ago
Christopher Usher 75cafdabb7 database changes to keep track of editors and edit times 5 years ago
Christopher Usher 67100c4126 comments 5 years ago
Christopher Usher 76bc629720 moved flask monitoring to its own module 5 years ago