In python 3, file.write() may do a partial write and returns the number of characters written.
In order to not lose data, we need to wrap every instance of file.write() with our new
common.writeall() wrapper that loops until the data is actually written.
Check that open() calls for reading and writing use binary modes
Use alpine version with py3-pip package
Use python3 in Dockerfile CMD
Remove sys.setdefaultencoding() "hack"
Simplify ensure_directory() in common.common package
In order for the upcoming playlist manager to be able to use the DB `tags` column to know
what tags a video has, all the tags it needs need to be present.
Previously, this was a problem because the day and category tags only get added at the cutter
and so wouldn't be listed.
This moves them so they are added when parsing the row in sheetsync.
It also adds the poster moment tag if poster moment is checked.
Note that fully static tags that go on all videos are still only added in cutter,
but the playlist manager doesn't need to care about those (since by definition
they will match every video).
By carefully ensuring most of our dockerfiles are identical in their first few layers,
we only need to build those layers once instead of every time.
In particular, we move installing gevent to before installing common,
so that even when common changes gevent doesn't need to be reinstalled.
This is important because gevent takes ages to install.
Also fixes segment_coverage, which wasn't being installed.
This differs from the existing reset row by only suceeding if the upload is not
in finalizing.
We also make some changes to cutter to handle this situation gracefully.
This is a nicer error than crashing in the depths of some error handler
(which is what will happen if the DB goes unavailable while they're running),
and it's a far more common case (eg. the DB is misconfigured) than having it fail
halfway through.
Neither of these services can do anything meaningful without the DB,
so crashing without it is acceptable behaviour.
This accomplishes two things:
1. It allows thrimshim to properly validate length restrictions (not implemented yet)
2. It means that the database has a record of the values actually written for each of these rows,
instead of that information depending on how the cutter was configured at the time.
Instead of handling each error condition seperately,
we raise an UploadError which includes whether it's retryable.
The advantage of this is that upload backends can also raise an UploadError
to indicate two conditions it currently cannot:
That an error is unretryable
That an error is retryable, even if the row was already in finalizing
Under this scheme, errors while cutting become unretryable UploadErrors,
and unhandled exceptions in uploading become retryable UploadErrors if
the row is not yet finalizing only.
In a fast cut, we edit the first and last segments then concatenate them all.
However, this leads to some tiny but perciptible artifacting around the border
of the first and second (and second-last and last) segments.
A full cut is much slower, but re-encodes the video into the desired format
and is more reliable.
We want both options to be available.
With this commit, we only add the option, we don't use it in restreamer or cutter.
Only support iterable of string, not file-like or string.
This is a minor usability loss but we only call this from one place anyway
and it's always an iterable of string.
This prevents videos being stuck in EDITED with no visible problem when
they contain holes, but is likely to false positive sometimes.
This is fine though, as it's just a human-readable warning and
it will be cleared as soon as any node accepts the row to be cut.
This deals with the problem where multiple youtube locations that refer
to the same actual account (but with different settings) will all try to check
for when videos are done transcoding, when only one is needed.
Cutter now takes a 'config' arg which is a json blob with detail
on each upload location. This is a bit nasty if you're trying to run it manually
but was the easiest way to transfer the config data from docker-compose.jsonnet
to the actual application.
This lays the groundwork for being able to cut to many upload locations.
Right now, only a single location can be configured, and only youtube is supported.
After certain kinds of DB error (eg. lost conn), we need to make a new conn
to have things work again. To be safe, we just do it after every error where it might
be a problem.
Since we never got a new conn after failure, we would just keep erroring with
"connection already closed" errors.
This isn't applicable to the main cutter loops since a DB failure there will restart the process.
Each method is fairly complicated, but is self-contained and can be examined independently.
cut_jobs in particular contains several extra helpers and directs control flow
via some iterators. This is unfortunately nessecary due to the requests interface.
This commit only lays out the main loop, showing the high-level flow
and defining shared utilities. This is for clarity.
The actual methods that do the work will be implemented seperately.
It runs on an interval, fetching all videos in TRANSCODING from the DB,
checking them against youtube, and then updating any that are done.
It should be noted that youtube somewhat lies about what being "done" means,
but this is a better approximation than nothing.
Provides basic youtube api calls, and gets passed into both transcode checker and cutter.
The official youtube client library is many orders of magnitude larger and more complicated,
and can't actually do what we want (stream an upload of unknown size).
The cutter has two jobs:
* To cut videos, taking them through states EDITED -> TRANSCODING
* To monitor TRANSCODING videos for when they're complete
We run these as separate greenlets with their own DB connections,
and if either dies we gracefully shut down the other.