Commit Graph

127 Commits (master)

Author SHA1 Message Date
Christopher Usher 34785d1179 now checking the hashes 5 years ago
Christopher Usher b0562495d2 reject mismatched hashes; more metrics 5 years ago
Christopher Usher 120a5a7de0 started on checking the hash 5 years ago
Chris Usher 557ddddc31 better logging for the backfiller 5 years ago
Christopher Usher 44f0e0defb changed it back so only the name is checked 5 years ago
Chris Usher aab46f9765 fixed localhost bug in backfiller 5 years ago
Christopher Usher 84270c02ec logging fix 5 years ago
Christopher Usher fdb5d20db7 fix to database logging 5 years ago
Christopher Usher 497845f2da typos in comments 5 years ago
Christopher Usher 361e577474 fixes based on ekimekims suggestions 5 years ago
Christopher Usher 720684a388 refactoring to have consistent terminology 5 years ago
Christopher Usher 6d38250674 starting to refactor stream to channel and variant to quality 5 years ago
Mike Lang f50276bd01 backfiller: Expose recent_cutoff as CLI arg and increase it to 120s default
In testing, GDQ's stream delay went up over 1min, which caused backfillers to backfill
segments at the same time they were downloaded. We increase the window for now,
and also make it configurable.
5 years ago
Mike Lang 29040a166c backfiller: Allow multiple concurrent segment downloads
This will signifigantly increase throughput when downloading
large ranges of segments.

The max concurrency is exposed as a cli arg.

We also slightly modify the logged info, so it reports segments downloaded,
not just number of missing segments (which we might skip downloading for various reasons).
5 years ago
Christopher Usher 37bad7d5ed Also reset database connection on error in the backfiller 5 years ago
Mike Lang 7179fcacec Backfiller: ignore temp segments
To make this work, we make type a proper segment field.

We also tell get_best_segments to ignore temp segments, since they might go away
before we can actually use them.
5 years ago
Christopher Usher dd246e1343 ekimekim's suggestions 5 years ago
Christopher Usher 9b28765ff2 Bug fixes to get the database connection working 5 years ago
Christopher Usher 4b9fbcb7d2 backfiller database code 5 years ago
Mike Lang f8d10dacdf Audit and fix all usage of dateutil
We wrap direct dateutil calls to handle two distinct cases:

* `common.dateutil.parse()`: We want to handle arbitrary timestamps including tz info,
then convert them to UTC.

This is used in HLS parsing, and for command line input for backfiller

* `common.dateutil.parse_utc_only()`: We want to only handle UTC timestamps,
but datetime.strptime isn't flexible enough (eg. can't handle missing fractional component).

This is used for restreamer request params.
5 years ago
Christopher Usher 072e51f287 Renaming a variable that should have been part of the last commit 6 years ago
Christopher Usher 61107346c8 Fixed backing off on exceptions and some more documenation 6 years ago
Christopher Usher 728adb7c1d improvements suggested by ekim 6 years ago
Christopher Usher 530b9f7d5e more improvements based on ekims comments 6 years ago
Chris Usher 332e03de80 started in on ekim's comments 6 years ago
Christopher Usher 2857d3fb9f comments and some whitespace handling 6 years ago
Christopher Usher 4e6dbe1c74 Added localhost option to backfill to avoid backfilling the local machine 6 years ago
Christopher Usher ade0ad3d18 rewrite of get_nodes to allow getting list of files from a file 6 years ago
Christopher Usher 23fea7b154 bug fixing after testing 6 years ago
Christopher Usher 149974ce54 added multiple streams by largely copy and pasting the code from the
downloader
6 years ago
Christopher Usher e4364b75b1 options to change where the node list is coming from 6 years ago
Christopher Usher baae0f1ac1 bug fix in arg list 6 years ago
Christopher Usher 65143a8ca2 more flexability for start time 6 years ago
Christopher Usher a8cb1ff370 fixed start not propagating to list_hours plus some refactorting 6 years ago
Christopher Usher 57bb74632f I should test these changes soon 6 years ago
Christopher Usher 64bc76c48b error handling I guess 6 years ago
Christopher Usher 09368d92e1 fixes and improvements suggested by ekimekim
* simplied the backfiller local - now just a full backfill every couple minuteso
6 years ago
Christopher Usher 4eac6189ce backfiller working in parallel 6 years ago
Christopher Usher f4385ad4e3 hopefully did break anything with this refactor 6 years ago
Christopher Usher 1f53fa8d29 Bug fixes and logging improvements to the backfiller 6 years ago
Christopher Usher c9f6ee95c5 clean up for new gevent based backfiller. 6 years ago
Christopher Usher 7d9a5b4626 added workers and a worker manager 6 years ago
Christopher Usher be8d40d1ba Move the code for calculating hours outside the code that backfills 6 years ago
Chris Usher ed58b6e44d reintroduced a start time for the backfiller; more logging 6 years ago
Mike Lang b75b9a9b00 Add stacksampler to all services 6 years ago
Mike Lang 901cda4814 Enable backdoor in all services, and add telnet to containers 6 years ago
Mike Lang 9af7795f34 Add gevent.backdoor as an optional arg to all services
Backdoor allows the operator to telnet into the given port, and get a python shell
running inside the process, from which you can debug, modify state (eg. set the log level),
or whatever. This is extremely useful for debugging weird states that you encounter randomly
but can't easily reproduce, without restarting the process and needing to wait until it happens again.
6 years ago
Mike Lang 90ccc6d827 backfiller: Track number of successful backfills
Other stats can come later, but this one is important as it tells us if
a downloader hasn't been doing its job.
6 years ago
Mike Lang c59892e148 backfiller: Add ability to set nodes as CLI arg 6 years ago
Mike Lang b4b315b6bc Expose prometheus metrics for backfiller and downloader 6 years ago
Mike Lang b0ded641c3 Add a logging handler which counts logs for prometheus stats
This isn't as good as having a full centralised logging system, but should
suffice to know if anything funny is happening.
6 years ago
Christopher Usher 3fcd374449 Moved encode_strings to common 6 years ago
Christopher Usher 93dd216f89 Fixes and suggestions from ekimekim 6 years ago
Christopher Usher db1b4e6539 Updated logging to match the other components 6 years ago
Christopher Usher bae039977b trying getting the backfiller to actually start 6 years ago
Christopher Usher 1fcd9b5b36 Adding in stuff to hopefully get this to run 6 years ago
Christopher Usher 013ad65c68 added a Dockerfile for the backfiller 6 years ago
Christopher Usher 48d11045d4 Change to backfiller.main to backfill the last 3 hours on start up before doing a full backfill 6 years ago
Christopher Usher 176633bf7d More messing around with backfill_node to allow finer grained control of order segments are fetched 6 years ago
Christopher Usher 3a7624b107 added a setup file for the backfiller 6 years ago
Christopher Usher ba499fe835 added more logging to backfiller 6 years ago
Mike Lang 6815924097 Fix some bugs and linter errors introduced by backfiller
I ran `pyflakes` on the repo and found these bugs:

```
./common/common.py:289: undefined name 'random'
./downloader/downloader/main.py:7: 'random' imported but unused
./backfiller/backfiller/main.py:150: undefined name 'variant'
./backfiller/backfiller/main.py:158: undefined name 'timedelta'
./backfiller/backfiller/main.py:171: undefined name 'sort'
./backfiller/backfiller/main.py:173: undefined name 'sort'
```
(ok, the "imported but unused" one isn't a bug, but the rest are)

This fixes those, as well as a further issue I saw with sorting of hours.

Iterables are not sortable. As an obvious example, what if your iterable was infinite?
As a result, any attempt to sort an iterable that is not already a friendly type like a list
or tuple will result in an error. We avoid this by coercing to list, fully realising the iterable
and putting it into a form that python will let us sort. It also avoids the nasty side-effect
of mutating the list that gets passed into us, which the caller may not expect. Consider this example:

```
>>> my_hours = ["one", "two", "three"]
>>> print my_hours
["one", "two", "three"]
>>> backfill_node(base_dir, node, stream, variants, hours=my_hours, order='forward')
>>> print my_hours
["one", "three", "two"]
```

Also, one of the linter errors was non-trivial to fix - we were trying to get a list of hours
(which is an api call for a particular variant), but at a time when we weren't dealing with a single
variant. My solution was to get a list of hours for ALL variants, and take the union.
6 years ago
Christopher Usher b42202434f Minor Fixes as sugged by ekimekim 6 years ago
Christopher Usher 0b524a72cb docstings and a few minor feature additions to the backfiller 6 years ago
Christopher Usher a59f6e1569 ignore tempuary files 6 years ago
Christopher Usher 3b0342b872 added options to limit range of hours backfilled and to randomise hours backfilled 6 years ago
Christopher Usher fec0975d18 fixed white space and the like 6 years ago
Christopher Usher afd948576d Forgot to try to remove temporary file 6 years ago
Christopher Usher 3cdfaad664 moved rename, ensure_directory and jitter to common
Move a few useful functions in downloader used in the backfiller to common
6 years ago
Christopher Usher 7d26997b1f modifications to the backfiller in response to ekimekim's comments 6 years ago
Christopher Usher ba52bf7f5d hopefully more robust 6 years ago
Christopher Usher 50bcb84c0c Moving things around to make the backfiller a bit more like a proper package 6 years ago
Christopher Usher 494725fe34 Getting close to something I can show ekimekim 6 years ago
Christopher Usher 5615c1bdb0 Chipping away at backfiller
I'm going to have to learn to write better commit messages
6 years ago
Christopher Usher 2fb17fff59 much closer to being functional 6 years ago
Christopher Usher 05fed36ac8 a few ideas extra 6 years ago
Christopher Usher 0e7ba25b76 start of a rough prototype of the backfiller 6 years ago