Commit Graph

144 Commits (87d4520e610fd5883e953923fde244a276c860de)

Author SHA1 Message Date
Mike Lang 9dfb00f4ab chat_archiver: Logic for checking and downloading media links 2 months ago
Mike Lang 2855ec759d download_media: Add pdf to default allowed content types
We want to capture linked PDFs in addition to videos and images
2 months ago
Mike Lang b46c577014 download_media: Add function for checking if a URL has been downloaded before 2 months ago
Mike Lang 352c9e9081 download_media: Get data from potentially malicious URLs and store in the filesystem
This is suitable for taking arbitary URLs from chat, etc and trying to fetch them.
It downloads them to a filepath that contains a hash of the URL and content.
2 months ago
Mike Lang 23ad78d592 Record in database when end time is "--"
We need this so that reverse sync reproduces these values correctly.

To handle this in the database, we have a composite type (dashed: boolean, value: timestamp).
Value is always valid and is equivalent to the old timestamp column,
but must be equal to start_time if dashed is true.

The only place we directly reference this column outside sheetsync is thrimshim, where we
always consider the value only.
3 months ago
Mike Lang e50adbf2da Fix a bug where transitions past the first are not timed correctly
The video offset is timed relative to the full video up until that point, not the previous range.
3 months ago
Mike Lang d4de1f94be Add descriptions to xfade transitions 3 months ago
Mike Lang d5f73c226c GoogleAPIClient: Improve error reporting 3 months ago
Mike Lang 6b0a025812 Add transition support to fast cuts 4 months ago
Mike Lang c8724a1e63 rewrite fast cuts to support transitions being allowed later
In theory there should be no change in actual output for no-transition cuts,
even though we're handling the logic in a very different way.

This doesn't actually allow transitions, but sets up most of what is needed
4 months ago
Mike Lang 066d10f94a Full cut: Support video transitions
We support all preset transitions in the xfade filter,
as well as a handful of "custom" ones we define.

We only support an audio cross-fade. We may want to support J and L audio cuts (switch audio
before/after the transition) later.
4 months ago
Mike Lang 5fbdaf8422 full cuts: Support multiple ranges
This allows full cuts to support multiple ranges in the same way fast cuts do,
by using multiple inputs to ffmpeg and concat filters joining them.

This will be easy to add transitions to later as this is "just" replacing a concat filter
with an xfade + afade filter.
4 months ago
Mike Lang cc789caa7e Move ffmpeg_cut_segment to new ffmpeg_cut_many() system 4 months ago
Mike Lang ba36338db4 Add simpler wrapper for ffmpeg_cut_segments() for single-input case
Also change ffmpeg_cut_many() arg order so common cases can have a default value.
4 months ago
Mike Lang e65145bcad Replace ffmpeg_cut_stdin() with ffmpeg_cut_many()
This is a more featureful wrapper around ffmpeg with notable differences:
- It's used as a context manager, and so can manage its own cleanup
- It takes care of input feeding
- It can handle multiple inputs (via pipes), instead of one (via stdin)

This drastically reduces the setup and cleanup code required for most basic usage,
and the multi-input support will be used in followup changes.
4 months ago
Mike Lang d571bbe81e ffmpeg_cut_stdin: Remove cut_start and duration built-in args
Of 4 users of this function, all but one set them to None.
We're about to replace that one usage with something else, so it makes more sense
to not have them as options at all and just have the user add to the encode args manually.
4 months ago
ZeldaZach 8bbc72184c Support hot reload of Zulip Schedule
- Move sheets API into common dir, since multi use
- Live download from Google Sheets using Config
- Falls back on old schedule if new one can't be downloaded for some reason
4 months ago
Mike Lang 264545eb9d CachedIterator: Fix bug where state can change while taking the lock
Resulting in a case where we grab the wrong result, or even try to get the next item
after the iterator has already been discarded.
7 months ago
Mike Lang 1857a998c9 reduce overhead of gevent.idle() by only yielding once per 1000 segments 7 months ago
Mike Lang 8ede4622ca CachedIterator: Re-serve any errors encountered while iterating
instead of the second one to reach the error treating it as a successful end of iterator.
7 months ago
Mike Lang 2a1f7207a8 Allow a fudge factor when checking for gaps/overlaps between segments
Sometimes in the wild (particularly on youtube) segments may not be timed perfectly, so allow up to 10ms of gap or overlap
to be counted as "equal" for purposes of finding the best segment.
1 year ago
Mike Lang c65eb2eae3 Add a default timeout on google APIs 1 year ago
Mike Lang cd4d08adc1 Yield after each segment when doing fast/smart cuts
To avoid blocking for long periods
1 year ago
Mike Lang e689626815 Add a small time range around the timestamp when extracting a frame
This should hopefully result in frames on the edge of timestamps being extracted
from a combination of the neighboring segment and the naive one,
so that we don't get errors extracting a frame.
1 year ago
Mike Lang 5a0704d3d7 Reject bustimes with negative minutes 1 year ago
Mike Lang 30f05b0656 thumbnails: Add a CLI for generating them directly 1 year ago
Mike Lang 80c9be0baf cutter: Get archive cut working 1 year ago
Mike Lang 5e7904dab3 wip: archive cut 1 year ago
Mike Lang 3ea0532838 wip: 1 year ago
Mike Lang c0e5f32459 Fix bad normalize function for fast_cut_range 1 year ago
Mike Lang 76c9208be5 Move chat_archiver atomic_write() to common for re-use 1 year ago
Mike Lang c5c8b3997b change how timestamps work again, so PCR and PTS are *both* set to start time 1 year ago
Mike Lang 58b4541306 Implement smart cuts 1 year ago
Mike Lang fa1603e99a fixts: Only use PCR to set offset, add 33ms to end time 1 year ago
Mike Lang eaf3ed2e54 fixts first attempt 1 year ago
Mike Lang c493869b9a Have list_segment_files also list chat archives
Otherwise backfilling of chat doesn't work
2 years ago
Mike Lang a3e16a2686 thumbnails: Take crop/scaling info from a json file next to the image file 2 years ago
Mike Lang 45c46df8bb Add thumbnail templating code 2 years ago
Mike Lang 08257386e2 Add restreamer endpoint for viewing chat messages 2 years ago
Mike Lang 1add3c5c22 Implement tombstoning to allow for segment deletion
Rarely, we find ourselves needing to explicitly delete some data, eg. something that shouldn't
have been public and should be removed from all records.

It would also be nice if we could "clean up" bad versions of the same segment,
which occasionally come up when downloaders have issues.

With our distributed segment database, this is actually rather difficult as deleting the data
from any one server would cause it to be restored from the others. It was only possible
by stopping all backfill, deleting the data on all servers, then starting backfill again.

Here we introduce a more practical approach. An operator creates an empty flag file
with the same name as the segment to be deleted, but with a `.tombstone` extension.
eg. to delete a file `/segments/desertbus/source/2019-11-13T02/45:51.608000-2.0-full-7IS92rssMzoSBQDIevHStbTNy-URRV3Vw-jzZ6pwOZM.ts`,
you would create a tombstone `/segments/desertbus/source/2019-11-13T02/45:51.608000-2.0-full-7IS92rssMzoSBQDIevHStbTNy-URRV3Vw-jzZ6pwOZM.tombstone`.

These tombstone files do two important things:
* They hide the segment from being listed, which both means:
  * It can't be restreamed or put into a video
  * It can't be backfilled to other nodes
* The tombstone files themselves do get backfilled to other nodes, so you only need to mark them on one server.

Once the tombstone has propagated to all nodes, the segment file can be deleted independently on each one.

We chose not to have a tombstone automatically trigger a segment deletion for safety reasons.
2 years ago
Mike Lang 44d0c0269a cache results of common.segments.best_segments_by_start
The restreamer spends most of its time iterating through segments (parsing them, determining the best one for each start time)
to serve large time ranges. Since this only depends on the list of filenames read from disk,
we can cache it for a given hour as long as that list is identical.

This is a little trickier than it sounds because best_segments_by_start is an iterator
and in most cases it won't be fully consumed. So we introduce a `CachedIterator` abstraction
that will both remember the previously yielded values, and keep track of the live iterator
so it can be resumed again if a previous invocation only partially consumed it.

This also has the nice side effect of merging simultaneous operations - if two requests come in
for the same hour at the same time, they'll share one iterator and both consume the results
as they come in.
3 years ago
Mike Lang 9f9ef66a85 Add endpoint to get a given frame of video 3 years ago
Mike Lang d1ba4bc4eb Downgrade overlapping segments from warning to info
They were causing too much log noise
3 years ago
Mike Lang 7649a4e840 Improve WSGIServer graceful shutdown handling
Previously both restreamer and thrimshim had some complex logic for dealing with
graceful shutdown, in different ways, that was still prone to race conditions.

We replace this with a common method that does it properly.

Fixes #226
3 years ago
Mike Lang aab8cf2f0f Set up plumbing for multi-range videos and implement no-transition fast cut videos only
This is the simplest case as we can just cut each range like we already do,
then concat the results.

We still allow for the full design in the database and cutter, but error out if transitions
is ever anything but hard cuts or if it's a full cut.

We also update the restreamer to allow accepting ranges, however for usability we still allow
the old "just one start and end" args.

Note this changes the thrimshim API to give and take the new "video_ranges" and "video_transitions" columns.
3 years ago
Mike Lang 3de44d6731 Add ability to render waveforms in restreamer 3 years ago
Mike Lang 7599681b6d yet another py3 map() issue
"hey i know lets make everything return an iterable but not update anything else to accept them"
3 years ago
Mike Lang 62bd6539ea Unpin gevent as that was a workaround for a py2 issue 3 years ago
Mike Lang 21856c68aa Fix all instances of file.write() for py3
In python 3, file.write() may do a partial write and returns the number of characters written.
In order to not lose data, we need to wrap every instance of file.write() with our new
common.writeall() wrapper that loops until the data is actually written.
3 years ago
Mike Lang a56f6859bb more py3 fixes 3 years ago