I had to go to some effort to get nice labelling,
which also meant none of the existing libs for this were any good,
but this works well enough.
Exposes the metrics on /metrics.
The calculations were backwards, so instead of cutting a video by, say, 2 seconds,
it would cut by -2 seconds, which was clamped to 0. So it would never actually cut,
it would always use the closest segment.
Also, once we were actually cutting, we hit an issue where ffmpeg would finish and close
its input early, because we'd reached the end of the cut video, but not all input had been written yet.
This resulted in an EPIPE error (write to closed pipe) in the input feeder. We now ignore that.
This cutter works by only cutting the first and last segments to size,
then concatting them with the other segments, so we only ever process a few seconds
of video instead of the entire video duration.
However, to make this work, care must be taken that the cut segments use the same codecs
as the other segments.
The reason it's experimental is that we are not yet confident in its ability
to cut accurately and without sync issues. We have seen some minor issues when trying to play
back the raw output files, but youtube's re-encoding has consistently smoothed out those issues
and they seem to be highly player-specific. Vigorous testing is needed.
Also note that both methods right now (cat then cut, and cut then cat) only work if all the segments
are cattable, that is they all use the same codecs, have the same resolution, etc.
If a stream were to change its encoding settings, and we were cutting over that change,
both approaches would not work. We should add checks for that scenario (which can only happen
over a stream drop), and if so fallback to a slow method using ffmpeg's concat filter,
which will work even for disparate codecs, though reconciling mismatched resolutions or frame rates
may require further work.
I ran `pyflakes` on the repo and found these bugs:
```
./common/common.py:289: undefined name 'random'
./downloader/downloader/main.py:7: 'random' imported but unused
./backfiller/backfiller/main.py:150: undefined name 'variant'
./backfiller/backfiller/main.py:158: undefined name 'timedelta'
./backfiller/backfiller/main.py:171: undefined name 'sort'
./backfiller/backfiller/main.py:173: undefined name 'sort'
```
(ok, the "imported but unused" one isn't a bug, but the rest are)
This fixes those, as well as a further issue I saw with sorting of hours.
Iterables are not sortable. As an obvious example, what if your iterable was infinite?
As a result, any attempt to sort an iterable that is not already a friendly type like a list
or tuple will result in an error. We avoid this by coercing to list, fully realising the iterable
and putting it into a form that python will let us sort. It also avoids the nasty side-effect
of mutating the list that gets passed into us, which the caller may not expect. Consider this example:
```
>>> my_hours = ["one", "two", "three"]
>>> print my_hours
["one", "two", "three"]
>>> backfill_node(base_dir, node, stream, variants, hours=my_hours, order='forward')
>>> print my_hours
["one", "three", "two"]
```
Also, one of the linter errors was non-trivial to fix - we were trying to get a list of hours
(which is an api call for a particular variant), but at a time when we weren't dealing with a single
variant. My solution was to get a list of hours for ALL variants, and take the union.
For ease-of-use, we use a jsonnet file to generate the yaml.
Jsonnet is a language for generating JSON documents.
In this case it's useful to us because it lets us have comments,
references to settings defined at the top, and some basic logic
like converting qualities from a list of strings to a comma-seperated string.
To avoid requiring jsonnet to be installed, we use the official jsonnet docker image
in the generate script.
This is mainly just for testing until we get the database and proper cutter up,
but it might prove useful to have in the long run too.
This code will probably end up being totally rewritten,
as it uses the most naive form of cutting and reencoding,
and it has a whole bunch of http-serving specifics intertwined with the cutting logic.
Previously, downloader would put files under BASE_DIR/VARIANT/HOUR/FILE.ts
now, it will put files under BASE_DIR/STREAM/VARIANT/HOUR/FILE.ts
This brings downloader in line with restreamer's concept of base_dir
This prevents clients from picking a variant that they then can't play any content for.
In general we expect the same content to be available on all variants being captured,
but if the set of captured variants changes we still want to handle that gracefully.