When pushed, this tells github to associate the ghcr.io repo that was pushed to
with the github repo specified (the owner needs to match).
This does a few things.
Most importantly, this automatically gives github actions credentials to push to these
repositories when run in the context of the wubloader repo.
In python 3, file.write() may do a partial write and returns the number of characters written.
In order to not lose data, we need to wrap every instance of file.write() with our new
common.writeall() wrapper that loops until the data is actually written.
Check that open() calls for reading and writing use binary modes
Use alpine version with py3-pip package
Use python3 in Dockerfile CMD
Remove sys.setdefaultencoding() "hack"
Simplify ensure_directory() in common.common package
Then treat backfilling each channel just like backfilling each quality.
This is conceptually simpler (only one kind of thing, a (channel, quality))
and has better behaviour when a node is down (we only have one lot of error handling around it).
It also means we aren't asking the database for the same info once per channel,
and cuts down on logging noise.
By carefully ensuring most of our dockerfiles are identical in their first few layers,
we only need to build those layers once instead of every time.
In particular, we move installing gevent to before installing common,
so that even when common changes gevent doesn't need to be reinstalled.
This is important because gevent takes ages to install.
Also fixes segment_coverage, which wasn't being installed.
We move all connection handling into get_nodes().
This means that problems connecting won't cause further errors
and cause the application to completely crash.
In turn, this means that the behaviour if the database goes down becomes
"continue backfilling from the nodes we know about" instead of crashing.
In testing, GDQ's stream delay went up over 1min, which caused backfillers to backfill
segments at the same time they were downloaded. We increase the window for now,
and also make it configurable.
This will signifigantly increase throughput when downloading
large ranges of segments.
The max concurrency is exposed as a cli arg.
We also slightly modify the logged info, so it reports segments downloaded,
not just number of missing segments (which we might skip downloading for various reasons).
To make this work, we make type a proper segment field.
We also tell get_best_segments to ignore temp segments, since they might go away
before we can actually use them.