Commit Graph

121 Commits (d376103dfae8021afc16ab3862a32bc1ff46f620)

Author SHA1 Message Date
Mike Lang 3606fadaa8 Pin gevent version to work around build issues
Seeing the following error on latest versions of gevent:

 Traceback (most recent call last):
   File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
     return _run_code(code, main_globals, None,
   File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
     exec(code, run_globals)
   File "/usr/lib/python3.9/site-packages/zulip_bots/schedulebot.py", line 2, in <module>
     import gevent.monkey
   File "/usr/lib/python3.9/site-packages/gevent/__init__.py", line 72, in <module>
     from gevent._hub_local import get_hub
   File "/usr/lib/python3.9/site-packages/gevent/_hub_local.py", line 150, in <module>
     import_c_accel(globals(), 'gevent.__hub_local')
   File "/usr/lib/python3.9/site-packages/gevent/_util.py", line 148, in import_c_accel
     mod = importlib.import_module(cname)
   File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
     return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'gevent._gevent_c_hub_local'
1 year ago
Mike Lang 78c053000e Upgrade pip in order to make wheels work 1 year ago
Mike Lang 78d0f227e8 backfiller: Include channel name being backfilled in logs 1 year ago
Mike Lang 4e51c3d6b7 backfiller: Update node list from database more often
5min is a long time when we want to pull a node from circulation,
and it's a very lightweight operation. So let's say 30 seconds.
1 year ago
Mike Lang 044dfb8084 Pin argh to avoid stupid breaking changes 1 year ago
Mike Lang f324ef23cf backfiller: Fix critical issues with backfilling extras 1 year ago
Mike Lang 9f523c65cd backfiller: Allow backfilling of non-segment directories
These are referred to as "extras" and all files in all subdirectories are backfilled
if not present.
1 year ago
Mike Lang 30d5ccc483 Fix all old references to github.com/ekimekim/wubloader 1 year ago
Mike Lang 78ee38a4b9 backfiller: Don't consider 404s for chat batches to be an error 2 years ago
Mike Lang c493869b9a Have list_segment_files also list chat archives
Otherwise backfilling of chat doesn't work
2 years ago
Mike Lang c50224415c more backfiller chat fixes
fixup: more backfiller fixes

Enable backfilling of chat logs
2 years ago
Mike Lang f8b3ace148 Backfill chat archives under the "chat" quality 2 years ago
Mike Lang 1add3c5c22 Implement tombstoning to allow for segment deletion
Rarely, we find ourselves needing to explicitly delete some data, eg. something that shouldn't
have been public and should be removed from all records.

It would also be nice if we could "clean up" bad versions of the same segment,
which occasionally come up when downloaders have issues.

With our distributed segment database, this is actually rather difficult as deleting the data
from any one server would cause it to be restored from the others. It was only possible
by stopping all backfill, deleting the data on all servers, then starting backfill again.

Here we introduce a more practical approach. An operator creates an empty flag file
with the same name as the segment to be deleted, but with a `.tombstone` extension.
eg. to delete a file `/segments/desertbus/source/2019-11-13T02/45:51.608000-2.0-full-7IS92rssMzoSBQDIevHStbTNy-URRV3Vw-jzZ6pwOZM.ts`,
you would create a tombstone `/segments/desertbus/source/2019-11-13T02/45:51.608000-2.0-full-7IS92rssMzoSBQDIevHStbTNy-URRV3Vw-jzZ6pwOZM.tombstone`.

These tombstone files do two important things:
* They hide the segment from being listed, which both means:
  * It can't be restreamed or put into a video
  * It can't be backfilled to other nodes
* The tombstone files themselves do get backfilled to other nodes, so you only need to mark them on one server.

Once the tombstone has propagated to all nodes, the segment file can be deleted independently on each one.

We chose not to have a tombstone automatically trigger a segment deletion for safety reasons.
2 years ago
Mike Lang a47c29fff4 Link images to github repo by adding a LABEL
When pushed, this tells github to associate the ghcr.io repo that was pushed to
with the github repo specified (the owner needs to match).

This does a few things.
Most importantly, this automatically gives github actions credentials to push to these
repositories when run in the context of the wubloader repo.
3 years ago
Mike Lang 62bd6539ea Unpin gevent as that was a workaround for a py2 issue 3 years ago
Mike Lang e63aa53019 Remove left-over usage of encode_strings
More py3 breakage
3 years ago
Mike Lang 21856c68aa Fix all instances of file.write() for py3
In python 3, file.write() may do a partial write and returns the number of characters written.
In order to not lose data, we need to wrap every instance of file.write() with our new
common.writeall() wrapper that loops until the data is actually written.
3 years ago
Mike Lang a56f6859bb more py3 fixes 3 years ago
Mike Lang f2a8007bf7 Fix build dependency issues 3 years ago
Mike Lang 7d4eb3c8db py3 fixes for backfiller 3 years ago
HubbeKing 6d790a1b36 Do a first naive pass for py3 compatibility
Check that open() calls for reading and writing use binary modes
Use alpine version with py3-pip package
Use python3 in Dockerfile CMD
Remove sys.setdefaultencoding() "hack"
Simplify ensure_directory() in common.common package
3 years ago
Mike Lang f0546e2ee3 Pin gevent to 1.5a2 to avoid https://github.com/gevent/gevent/issues/1711 3 years ago
Mike Lang b53fcd65a0 Add dependencies required to install psycopg2 from source
We can't install the binaries as they don't support musl
4 years ago
HubbeKing 86f7823348 Replace calls to gevent.signal() with gevent.signal_handler()
gevent.signal() was removed in gevent 1.5a4, see http://www.gevent.org/api/gevent.signal.html
Removed on Feb 5th, see https://github.com/gevent/gevent/pull/1530
4 years ago
Mike Lang a53786dc2d Add file and make as build dependencies
gevent now requires these to build. I'm not sure when this changed.
4 years ago
Christopher Usher 986c9a3413 removed redundant option 5 years ago
Christopher Usher 9c77dd1f40 added the ability to generate a webpage with all coverage maps 5 years ago
Mike Lang 71333cf826 backfiller: Only run one manager, not one per channel
Then treat backfilling each channel just like backfilling each quality.

This is conceptually simpler (only one kind of thing, a (channel, quality))
and has better behaviour when a node is down (we only have one lot of error handling around it).

It also means we aren't asking the database for the same info once per channel,
and cuts down on logging noise.
5 years ago
Mike Lang ff18c7df54 backfiller: Fix issue with tracking metrics after get_nodes() failure
There isn't an easy, clean way to pass in the DB hostname there,
and neither label is very valuable. Let's just drop it entirely.
5 years ago
Mike Lang 731ef9e2d0 Refactor dockerfiles for more shared layers
By carefully ensuring most of our dockerfiles are identical in their first few layers,
we only need to build those layers once instead of every time.

In particular, we move installing gevent to before installing common,
so that even when common changes gevent doesn't need to be reinstalled.

This is important because gevent takes ages to install.

Also fixes segment_coverage, which wasn't being installed.
5 years ago
Mike Lang d63ae573b7 backfiller: Collect metrics on http calls 5 years ago
Christopher Usher dc1c31fef4 fixes to the error handling as suggested by ekim 5 years ago
Christopher Usher 553b11bc84 typo in error handler 5 years ago
Christopher Usher 9dfffe0a62 improvements based on ekims suggestions plus delete_hours yeilds to the
rest of the backfiller
5 years ago
Christopher Usher 8579fcaeea check start is not None before checking whether hour is before start 5 years ago
Christopher Usher f8835cd253 documentation for --delete-old; check that start is not none 5 years ago
Christopher Usher a361249145 renamed delete_before to keep_hours 5 years ago
Christopher Usher be562b8448 added warnings if keeping fewer hours than backfilling 5 years ago
Christopher Usher 9d81569d98 Added the ability to delete old hours 5 years ago
Mike Lang 7183b25ce9
Merge pull request #119 from ekimekim/mike/database-resilience
Changes to improve behaviour if the DB is down
5 years ago
Christopher Usher 929308f3e7 started on the segment_coverage service 5 years ago
Mike Lang 0e437566aa backfiller: Don't crash on DB errors
We move all connection handling into get_nodes().
This means that problems connecting won't cause further errors
and cause the application to completely crash.

In turn, this means that the behaviour if the database goes down becomes
"continue backfilling from the nodes we know about" instead of crashing.
5 years ago
Christopher Usher ccb7f3c684 now use parse_segment_path to get hash from filename 5 years ago
Christopher Usher 36da1926d0 fixes for ekims suggestions 5 years ago
Christopher Usher 34785d1179 now checking the hashes 5 years ago
Christopher Usher b0562495d2 reject mismatched hashes; more metrics 5 years ago
Christopher Usher 120a5a7de0 started on checking the hash 5 years ago
Chris Usher 557ddddc31 better logging for the backfiller 5 years ago
Christopher Usher 44f0e0defb changed it back so only the name is checked 5 years ago
Chris Usher aab46f9765 fixed localhost bug in backfiller 5 years ago