Note this moves over the 'experimental' cutter and deletes the original cutter
that concatenates entire videos before cutting.
We may eventually want to revive that method if the experimental cutter turns out
to introduce too many issues.
We move most of the code over verbatim, but adjust it such that it acts
as a generic iterator that can be used in a variety of contexts.
Some other changes made during the move include telling ffmpeg to be quieter
(don't output version info and junk, only log if something goes wrong),
and avoiding errors during cleanup.
This is a performance optimization, allowing us to fail out early (potentially avoiding a LOT
of work) if we know we're going to reject any result that contains holes.
We add a new exception ContainsHoles that is raised in this condition.
The cutter has two jobs:
* To cut videos, taking them through states EDITED -> TRANSCODING
* To monitor TRANSCODING videos for when they're complete
We run these as separate greenlets with their own DB connections,
and if either dies we gracefully shut down the other.
This should help prevent changing state to EDITED with any of these fields unset,
which would blow up the cutter.
We also fix up upload_location, which was set up as a sheet input (NOT NULL DEFAULT ''),
and add a similar constraint saying any DONE columns must have non-NULL video link.
All our usage was of a single query anyway, so autocommit is easier to handle.
You can still opt into a longer transaction using the transaction() helper.
This code manages the database connections, setting their isolation level correctly
and ensuring the idempotent schema is applied before they're used.
Applying the schema on startup means we don't need to deal with the database's state,
setting it up before running, running migrations etc. However, it does put constraints on
the changes we can safely make.
Our use of seralizable isolation means that all transactions can be treated as fully
independent - the server must behave as though they'd been run seperately in some valid order.
This will give us the least surprising results when multiple connections try to modify the same
data, though we'll need to deal with occasional transaction commit failures due to conficts.
* Had to rename `end` as `end` is a reserved word in postgres SQL.
`event_end` is more consistent with `video_end` anyway. Updated `start` to match.
* Added ability to specify channel and stream quality in the editor, which may prove useful
if we have issues with a particular stream quality, or if content needs to be captured from
other channels.
Changed allow_holes and uploader_whitelist to be edit inputs - there's no need for them to come from the sheet; and we'll have an admin dashboard for modifying them if needed.
This solves the problem of rows which don't need a full cut video,
but we'd like to link to an image or a short gif or clip of it.
It is a sheet input that is only used in the output sheet, so it doesn't affect the wubloader itself.
* Checks for the SCTE35-OUT/SCTE35-IN marks in the HLS stream that indicate an ad start/end
* Ignores those segments completely
* Doesn't mark the StreamWorker as up until it sees the first non-ad segment
Some other operational notes:
* The main risk this adds is that re-connecting / refreshing master playlist takes longer.
If all downloaders are doing this at the same time (ie. because the stream only just came up,
or during a deployment rollout), all downloaders might be waiting for ads to finish and
you'll miss segments.
* We should run more downloaders to compensate. This also increases the chance at least one of
them won't get any ads, so we get everything right from stream-up.
* The other mitigation we can do is have geographically diverse downloaders. This decreases the risk
that they all get served an ad, and at least at time of writing it seems that no in-stream ads
are served outside of these regions:
> US, Canada, Germany, France, Sweden, Belgium, Poland, Norway, Finland, Denmark, Netherlands, Italy, Spain, Switzerland, Austria, Portugal, UK, Australia, New Zealand
Split UPLOADED into TRANSCODING and DONE, to represent the time after upload
that youtube is transcoding the video and it's not viewable.
Any cutter can poll for the state of a transcoding video and mark it as done.
Add some extra sheet input columns.
I fully expect the exact list of sheet inputs, edit inputs and outputs to change.
The important thing I wanted to codify here was the state machine and the behaviour of the cutters.
In resource contention scenarios, all calls can start failing due to
not being able to read the response in a timely manner.
This means SegmentGetters never stop retrying, leading to further contention
and a feedback loop.
We attempt to put at least some cap on this scenario by giving up
if an amount of time has elapsed to the point that we know our URL couldn't be valid anymore.
Since we don't actually know how long segment URLs are valid, we are very conservative about
this time, for now setting it to 20min.