I ran `pyflakes` on the repo and found these bugs:
```
./common/common.py:289: undefined name 'random'
./downloader/downloader/main.py:7: 'random' imported but unused
./backfiller/backfiller/main.py:150: undefined name 'variant'
./backfiller/backfiller/main.py:158: undefined name 'timedelta'
./backfiller/backfiller/main.py:171: undefined name 'sort'
./backfiller/backfiller/main.py:173: undefined name 'sort'
```
(ok, the "imported but unused" one isn't a bug, but the rest are)
This fixes those, as well as a further issue I saw with sorting of hours.
Iterables are not sortable. As an obvious example, what if your iterable was infinite?
As a result, any attempt to sort an iterable that is not already a friendly type like a list
or tuple will result in an error. We avoid this by coercing to list, fully realising the iterable
and putting it into a form that python will let us sort. It also avoids the nasty side-effect
of mutating the list that gets passed into us, which the caller may not expect. Consider this example:
```
>>> my_hours = ["one", "two", "three"]
>>> print my_hours
["one", "two", "three"]
>>> backfill_node(base_dir, node, stream, variants, hours=my_hours, order='forward')
>>> print my_hours
["one", "three", "two"]
```
Also, one of the linter errors was non-trivial to fix - we were trying to get a list of hours
(which is an api call for a particular variant), but at a time when we weren't dealing with a single
variant. My solution was to get a list of hours for ALL variants, and take the union.
For ease-of-use, we use a jsonnet file to generate the yaml.
Jsonnet is a language for generating JSON documents.
In this case it's useful to us because it lets us have comments,
references to settings defined at the top, and some basic logic
like converting qualities from a list of strings to a comma-seperated string.
To avoid requiring jsonnet to be installed, we use the official jsonnet docker image
in the generate script.
This is mainly just for testing until we get the database and proper cutter up,
but it might prove useful to have in the long run too.
This code will probably end up being totally rewritten,
as it uses the most naive form of cutting and reencoding,
and it has a whole bunch of http-serving specifics intertwined with the cutting logic.
Previously, downloader would put files under BASE_DIR/VARIANT/HOUR/FILE.ts
now, it will put files under BASE_DIR/STREAM/VARIANT/HOUR/FILE.ts
This brings downloader in line with restreamer's concept of base_dir
This prevents clients from picking a variant that they then can't play any content for.
In general we expect the same content to be available on all variants being captured,
but if the set of captured variants changes we still want to handle that gracefully.
This is needed by both the restreamer and the cutter, hence its inclusion in common.
The algorithm is pretty simple - it takes the 'best' segment per start time by full first,
then length of partial. All the other complexity is mainly just around detecting and reporting holes,
and being inclusive of start/end points.