This helps differentiate the multiple syncs we now have and will have:
- syncing events from streamlog
- reverse syncing events to sheets
- syncing playlists
This is a mode where all data flows one-way from the database to the sheet.
It is intended to be used to populate an empty sheet from database events,
possibly sourced from somewhere else.
To make this work, a few changes were required:
* Track which ids we've seen so we know what events were not matched with a row
* Allow `row` to be None in sync_rows
* When it is, call the middleware to create a new row with a new id
* In sheets, this is implemented by tracking the last empty rows we saw, and claiming them as needed.
NOTE ON CONFLICTS
In master, we moved sheets.py to common as it only contained a generic client.
Now sheets.py also contains specific sheetsync stuff.
Our resolution:
- Keep the generic version in common
- Keep the old version verbatim (including the now-redundant generic client) in sheetsync
We will move the sheetsync implementation to the generic client after the rebase is complete.
- Move sheets API into common dir, since multi use
- Live download from Google Sheets using Config
- Falls back on old schedule if new one can't be downloaded for some reason
Seeing the following error on latest versions of gevent:
Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/lib/python3.9/site-packages/zulip_bots/schedulebot.py", line 2, in <module>
import gevent.monkey
File "/usr/lib/python3.9/site-packages/gevent/__init__.py", line 72, in <module>
from gevent._hub_local import get_hub
File "/usr/lib/python3.9/site-packages/gevent/_hub_local.py", line 150, in <module>
import_c_accel(globals(), 'gevent.__hub_local')
File "/usr/lib/python3.9/site-packages/gevent/_util.py", line 148, in import_c_accel
mod = importlib.import_module(cname)
File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'gevent._gevent_c_hub_local'
The sheetsync loads playlist ids and tags into a new table `playlists`.
playlist manager reads this table and merges it with the playlists given on the command line.
When pushed, this tells github to associate the ghcr.io repo that was pushed to
with the github repo specified (the owner needs to match).
This does a few things.
Most importantly, this automatically gives github actions credentials to push to these
repositories when run in the context of the wubloader repo.
This is the simplest case as we can just cut each range like we already do,
then concat the results.
We still allow for the full design in the database and cutter, but error out if transitions
is ever anything but hard cuts or if it's a full cut.
We also update the restreamer to allow accepting ranges, however for usability we still allow
the old "just one start and end" args.
Note this changes the thrimshim API to give and take the new "video_ranges" and "video_transitions" columns.
Check that open() calls for reading and writing use binary modes
Use alpine version with py3-pip package
Use python3 in Dockerfile CMD
Remove sys.setdefaultencoding() "hack"
Simplify ensure_directory() in common.common package
In order for the upcoming playlist manager to be able to use the DB `tags` column to know
what tags a video has, all the tags it needs need to be present.
Previously, this was a problem because the day and category tags only get added at the cutter
and so wouldn't be listed.
This moves them so they are added when parsing the row in sheetsync.
It also adds the poster moment tag if poster moment is checked.
Note that fully static tags that go on all videos are still only added in cutter,
but the playlist manager doesn't need to care about those (since by definition
they will match every video).
New tags column shunts all columns after it right by 1.
Note we parse tags by splitting on commas then discarding whitespace.
If this would create an empty string tag, it is ignored.
Example: "foo, bar baz,a,,bc " -> ["foo", "bar baz", "a", "bc"]
This lets us view a number of useful graphs in dashboards, eg. rows by state,
errored rows, rows by day, rows by category, meltdowns per day, fraction of
events that are poster moments by category.
Sheetsync was the natural place to do this since it was already periodically scanning
the entire events table.
Instead of always waiting 5 seconds between runs,
wait until 5 seconds after the previous run started.
This ensures we actually run every 5sec and not every 5sec + how long it takes to run
Only check the other sheets every 4th time (20sec instead of 5sec).
This elminiates a huge source of unnessecary reads, which prevents us from going over
our API limit.
By carefully ensuring most of our dockerfiles are identical in their first few layers,
we only need to build those layers once instead of every time.
In particular, we move installing gevent to before installing common,
so that even when common changes gevent doesn't need to be reinstalled.
This is important because gevent takes ages to install.
Also fixes segment_coverage, which wasn't being installed.
This is a nicer error than crashing in the depths of some error handler
(which is what will happen if the DB goes unavailable while they're running),
and it's a far more common case (eg. the DB is misconfigured) than having it fail
halfway through.
Neither of these services can do anything meaningful without the DB,
so crashing without it is acceptable behaviour.
This tells us which sheet a row came from
(so we don't need to scan every sheet to find it if we're trying to do
lookups in that direction).
It is also needed in order to tag the videos with the Day number.
Exposes a way to read all rows, and write a single cell.
We need to read all columns of each row so we know what would be modified
so we only do updates to single cells that aren't already the correct value.
This keeps us from impacting the sheet load too much with constantly changing values,
which I think might be a thing even if the values are the same.