Commit Graph

34 Commits (bff3fa78006c4a4ba7534e26361ff5788ddf9637)

Author SHA1 Message Date
Mike Lang 7b590cf574 chat-archiver: Some cleanups to the URL matching regex
With thanks to Me-Me for review
3 months ago
Mike Lang 9dfb00f4ab chat_archiver: Logic for checking and downloading media links 3 months ago
Mike Lang 07055e3605 chat-archiver: extract the ensure_emotes greenlet management to a class 3 months ago
Mike Lang 3606fadaa8 Pin gevent version to work around build issues
Seeing the following error on latest versions of gevent:

 Traceback (most recent call last):
   File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
     return _run_code(code, main_globals, None,
   File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
     exec(code, run_globals)
   File "/usr/lib/python3.9/site-packages/zulip_bots/schedulebot.py", line 2, in <module>
     import gevent.monkey
   File "/usr/lib/python3.9/site-packages/gevent/__init__.py", line 72, in <module>
     from gevent._hub_local import get_hub
   File "/usr/lib/python3.9/site-packages/gevent/_hub_local.py", line 150, in <module>
     import_c_accel(globals(), 'gevent.__hub_local')
   File "/usr/lib/python3.9/site-packages/gevent/_util.py", line 148, in import_c_accel
     mod = importlib.import_module(cname)
   File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
     return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'gevent._gevent_c_hub_local'
1 year ago
Mike Lang 78c053000e Upgrade pip in order to make wheels work 1 year ago
Mike Lang 30c1877b36 Fix bugs with chat_archiver
* Order of arguments matters - option can't be between other positional args and *args
* Messed up the count structure
1 year ago
Mike Lang 260293d40d chat_archiver: Allow passing multiple channels on CLI 1 year ago
Mike Lang 91910c0972 chat_archiver: Fix misconfiguration where MAX_SERVER_LAG < MAX_DELAY
This leads to delayed JOIN/PARTs not being put in their proper batch
as it's already been closed. In fact, since each message is re-opening a batch from
more than MAX_SERVER_LAG seconds ago, each message becomes one batch.
1 year ago
Mike Lang 200d2df9ba chat_archiver: Add code support for archiving multiple channels at once
* Join every channel on connect
* Move the "wait for initial ROOMSTATE" logic into the main loop and make it per-channel
* Make batch keys (channel, time) instead of just time

For now the CLI doesn't actually allow you to run chat_archiver in this mode,
it always calls the implementation with a 1-element list of channels.
1 year ago
Mike Lang b050b71036 chat_archiver: Improve logging and monitoring by using unique client name more 1 year ago
Mike Lang 044dfb8084 Pin argh to avoid stupid breaking changes 1 year ago
Mike Lang 76c9208be5 Move chat_archiver atomic_write() to common for re-use 1 year ago
Mike Lang 30d5ccc483 Fix all old references to github.com/ekimekim/wubloader 1 year ago
Mike Lang ad4827237f Fix bug in checking if message has tags 2 years ago
Mike Lang 681da9a76e Fix a bug where we try to fetch emote "" for messages with no emotes 2 years ago
Mike Lang 3b6ce86c46 chat archiver: Add cli tool for downloading emotes 2 years ago
Mike Lang e74d655ce5 chat_archiver: Download each seen emote
so we have a permanent record, in case they're deleted or changed later
2 years ago
Mike Lang 8e314eea94 Collect metrics for chat_archiver on port 8008 2 years ago
Mike Lang 08257386e2 Add restreamer endpoint for viewing chat messages 2 years ago
Mike Lang 9320251de7 Some extra documentation on chat_archiver 2 years ago
Mike Lang d8a9b5ddf0 chat_archiver: Always sort json object keys to ensure canonical output 2 years ago
Mike Lang 651658e507 JOINs and PARTs have been observed with up to 30sec difference
it turns out to be completely undocumented what the max delay is. so let's assume 45s.
anything > 60s might cause problems due to matching messages being more than 1 batch apart.
2 years ago
Mike Lang 05a989f67d chat-archiver: fixes 2 years ago
Mike Lang c1c1c11bce chat_archiver: Add prometheus metrics 2 years ago
Mike Lang c25d79e7c2 chat-archiver: Merge all files every minute 2 years ago
Mike Lang 4cfc362f76 chat-archiver: pass in node name
instead of using container hostname
2 years ago
Mike Lang a48beab576 chat-archiver: update girc for py3 support and fixes 2 years ago
Mike Lang 315c9c8297 Integrate chat archiver as a proper component 2 years ago
Mike Lang f8b3ace148 Backfill chat archives under the "chat" quality 2 years ago
Mike Lang 05ddd39504 chat_archiver: Split files into directories by hour
matching how we handle video files
2 years ago
Mike Lang 1d626738bd chat_archiver: Start a new client on RECONNECT 2 years ago
Mike Lang 96cc212bf0 chat_archiver: fixes, implement merge_all 2 years ago
Mike Lang d32cbbb7e1 chat-archiver: File merging and other fixes 2 years ago
Mike Lang 0756539b85 chat-archiver: Early work and basic archival 2 years ago