Commit Graph

118 Commits (734a7371f36db43c7e12867c9fa50da405bfa6c2)

Author SHA1 Message Date
Mike Lang 80c9be0baf cutter: Get archive cut working 11 months ago
Mike Lang 5e7904dab3 wip: archive cut 11 months ago
Mike Lang 3ea0532838 wip: 11 months ago
Mike Lang c0e5f32459 Fix bad normalize function for fast_cut_range 11 months ago
Mike Lang 76c9208be5 Move chat_archiver atomic_write() to common for re-use 12 months ago
Mike Lang c5c8b3997b change how timestamps work again, so PCR and PTS are *both* set to start time 1 year ago
Mike Lang 58b4541306 Implement smart cuts 1 year ago
Mike Lang fa1603e99a fixts: Only use PCR to set offset, add 33ms to end time 1 year ago
Mike Lang eaf3ed2e54 fixts first attempt 1 year ago
Mike Lang c493869b9a Have list_segment_files also list chat archives
Otherwise backfilling of chat doesn't work
2 years ago
Mike Lang a3e16a2686 thumbnails: Take crop/scaling info from a json file next to the image file 2 years ago
Mike Lang 45c46df8bb Add thumbnail templating code 2 years ago
Mike Lang 08257386e2 Add restreamer endpoint for viewing chat messages 2 years ago
Mike Lang 1add3c5c22 Implement tombstoning to allow for segment deletion
Rarely, we find ourselves needing to explicitly delete some data, eg. something that shouldn't
have been public and should be removed from all records.

It would also be nice if we could "clean up" bad versions of the same segment,
which occasionally come up when downloaders have issues.

With our distributed segment database, this is actually rather difficult as deleting the data
from any one server would cause it to be restored from the others. It was only possible
by stopping all backfill, deleting the data on all servers, then starting backfill again.

Here we introduce a more practical approach. An operator creates an empty flag file
with the same name as the segment to be deleted, but with a `.tombstone` extension.
eg. to delete a file `/segments/desertbus/source/2019-11-13T02/45:51.608000-2.0-full-7IS92rssMzoSBQDIevHStbTNy-URRV3Vw-jzZ6pwOZM.ts`,
you would create a tombstone `/segments/desertbus/source/2019-11-13T02/45:51.608000-2.0-full-7IS92rssMzoSBQDIevHStbTNy-URRV3Vw-jzZ6pwOZM.tombstone`.

These tombstone files do two important things:
* They hide the segment from being listed, which both means:
  * It can't be restreamed or put into a video
  * It can't be backfilled to other nodes
* The tombstone files themselves do get backfilled to other nodes, so you only need to mark them on one server.

Once the tombstone has propagated to all nodes, the segment file can be deleted independently on each one.

We chose not to have a tombstone automatically trigger a segment deletion for safety reasons.
2 years ago
Mike Lang 44d0c0269a cache results of common.segments.best_segments_by_start
The restreamer spends most of its time iterating through segments (parsing them, determining the best one for each start time)
to serve large time ranges. Since this only depends on the list of filenames read from disk,
we can cache it for a given hour as long as that list is identical.

This is a little trickier than it sounds because best_segments_by_start is an iterator
and in most cases it won't be fully consumed. So we introduce a `CachedIterator` abstraction
that will both remember the previously yielded values, and keep track of the live iterator
so it can be resumed again if a previous invocation only partially consumed it.

This also has the nice side effect of merging simultaneous operations - if two requests come in
for the same hour at the same time, they'll share one iterator and both consume the results
as they come in.
2 years ago
Mike Lang 9f9ef66a85 Add endpoint to get a given frame of video 3 years ago
Mike Lang d1ba4bc4eb Downgrade overlapping segments from warning to info
They were causing too much log noise
3 years ago
Mike Lang 7649a4e840 Improve WSGIServer graceful shutdown handling
Previously both restreamer and thrimshim had some complex logic for dealing with
graceful shutdown, in different ways, that was still prone to race conditions.

We replace this with a common method that does it properly.

Fixes #226
3 years ago
Mike Lang aab8cf2f0f Set up plumbing for multi-range videos and implement no-transition fast cut videos only
This is the simplest case as we can just cut each range like we already do,
then concat the results.

We still allow for the full design in the database and cutter, but error out if transitions
is ever anything but hard cuts or if it's a full cut.

We also update the restreamer to allow accepting ranges, however for usability we still allow
the old "just one start and end" args.

Note this changes the thrimshim API to give and take the new "video_ranges" and "video_transitions" columns.
3 years ago
Mike Lang 3de44d6731 Add ability to render waveforms in restreamer 3 years ago
Mike Lang 7599681b6d yet another py3 map() issue
"hey i know lets make everything return an iterable but not update anything else to accept them"
3 years ago
Mike Lang 62bd6539ea Unpin gevent as that was a workaround for a py2 issue 3 years ago
Mike Lang 21856c68aa Fix all instances of file.write() for py3
In python 3, file.write() may do a partial write and returns the number of characters written.
In order to not lose data, we need to wrap every instance of file.write() with our new
common.writeall() wrapper that loops until the data is actually written.
3 years ago
Mike Lang a56f6859bb more py3 fixes 3 years ago
Mike Lang 3e69000058 py3 fixes for common 3 years ago
Mike Lang d03ae49eec Remove defunct "smart cut" method
This was an alternate way of doing a cut that turned out to work exactly the same as a fast cut,
just with a more complex implementation.
3 years ago
HubbeKing 6d790a1b36 Do a first naive pass for py3 compatibility
Check that open() calls for reading and writing use binary modes
Use alpine version with py3-pip package
Use python3 in Dockerfile CMD
Remove sys.setdefaultencoding() "hack"
Simplify ensure_directory() in common.common package
3 years ago
Mike Lang f0546e2ee3 Pin gevent to 1.5a2 to avoid https://github.com/gevent/gevent/issues/1711 3 years ago
Mike Lang 9d8c47377f segment parsing: Hand-roll microsecond parsing
float() is inaccurate and Decimal() is very slow (~3x the cpu usage)
so instead we right-pad with 0s (eg. so 1.2345 -> 1.234500) then convert to int microsec directly.
4 years ago
Mike Lang 66669cd4e4 common: When parsing segment timestamps, use decimal instead of float
Floating point error leads to 1us differences in parsed times,
which causes false positives in the overlapping segments check.

By using a Decimal, we get the exact digits from the filepath.
4 years ago
Mike Lang 13a228070a common.segments: Speed up segment parsing by rolling our own time parsing
strptime is very slow. In terms of pure get_best_segments() speed, this change
more than doubles the throughput.

In particular for segment_coverage, this halves the run time for each check.
4 years ago
Mike Lang b029250c1c Disable stacksampler by default
It causes problems due to the sheer number of unique metrics emitted, which makes
the prometheus endpoint be very expensive / fail a lot.

The data is not useful enough to justify the cost.
4 years ago
Mike Lang 1b12c05e0e make smart cut work, only to discover it doesn't actually have any advantage over fast 4 years ago
Mike Lang 2dbd1132fe common.googleapis: Fix a bug in retrying failed access token get
Seems that this was never fixed when the code was moved.
5 years ago
Mike Lang 7dcd844e16 add logging to help debug smart cut 5 years ago
Mike Lang c294fa82b8 smart cut: Fix output format 5 years ago
Mike Lang c6172ce37f smart cut: More typos 5 years ago
Mike Lang 82346a55ca smart cut: Fix int in ffmpeg args 5 years ago
Mike Lang b39e844c1e restreamer: Fix missing import of smart cut 5 years ago
Mike Lang 21d5548980 Add new segment type "suspect"
We've noticed that when nodes have connection problems, they get full segments
with different hashes. Inspection of these segments shows that
they all have identical data up to a point.

Segments that fetched normally will then have the remainder of the data.
Segments that had issues will have a slightly corrupted end.
The data is still valid, and no errors are raised. It just doesn't have all the data.

We noticed that these corrupted segments all were cut off exactly 60sec after their requests
began. We believe this is a server-side timeout on the request that returns whatever data
it has, then closes the container file cleanly before returning successfully.

We detect segments that take > 59 seconds to recieve, and label them as "suspect".

Suspect segments are treated identically to partial segments, except they are always preferred
over partials.
5 years ago
Mike Lang bb05e37ae4 segments: Use longest segment in bytes if duration is the same
We occasionally see corrupted segments that are slightly shorter in size
but report the same metadata as the full segments. Prefer the largest version
as it's likely the least corrupt.
5 years ago
Mike Lang b516917e62 Add new "smart" cut technique 5 years ago
Mike Lang eba5fc498a Remove flask response size tracking
Despite our best efforts, this was causing chunked responses to be fully
buffered into memory as a side effect.

This is really bad because responses can be VERY large.
5 years ago
Mike Lang 2efe1d6218 Fix a bad logging line when handling errors 5 years ago
Mike Lang 59ee5cf5c0 Only log at INFO about multiple versions of a segment
Since these tend to happen around stream endings, etc,
we don't want them to be crazy noisy and cause us to disregard real problems.

We can use the segment coverage to see in metrics if there are overlaps.
5 years ago
Mike Lang 249e32583b get_best_segments: Don't error if the only segments that exist for time are temp 5 years ago
Mike Lang 6b602592f5 Allow disabling of stacksampling with an env var
This gives an easy way to do so across all services without adding new options.

Reasons to do so might be to avoid overheads or because your prometheus metrics grow too large.
5 years ago
Mike Lang 4d3aa94a71 Automatically set default encoding to utf-8 when common is imported
To be clear, this is an awful hack.

It means that any implicit str/unicode coersion will use the utf-8 encoding,
which is basically always what you want.

However, it is possible that some badly-written libraries might be relying
on the default encoding being ascii, and will do weird things as a result.

Finally, it's especially hacky to be doing this as part of importing a library.
Normally you're meant to do this as part of a sitecustomize.py in your python system directory,
and the function is deleted before passing control to normal code (this is why we need
to reload() to get it back).
5 years ago
Mike Lang 4d5157cdb5 Fix a mistake with allowing reuse of name in @timed() 5 years ago
Mike Lang 426b1328be Fix mistakes in common.requests 5 years ago