wubloader

Commit Graph

Author	SHA1	Message	Date
Mike Lang	2db20d202b	sheetsync: Update streamlog middleware for section -> tab rename	3 months ago
Mike Lang	9fb356bf45	sheetsync: Better error handling for running out of space when creating rows	3 months ago
Mike Lang	29ff11457e	sheetsync: Namespace all logs and metrics behind a sheetsync "name" This helps differentiate the multiple syncs we now have and will have: - syncing events from streamlog - reverse syncing events to sheets - syncing playlists	3 months ago
Mike Lang	87b504a00a	sheetsync: Rename "row" and "event" to "sheet_row" and "db_row" First step to combining event and playlist sync into one codepath. No actual behaviour changes intended.	3 months ago
Mike Lang	20ee79cdb1	Get sheets working again	3 months ago
Mike Lang	f89ab6fa43	Don't make sheet name an input column, go back to special casing it on row create This means it won't update if put in the wrong place, but avoids issues with reverse sync trying to write it out when it's not an actual column	3 months ago
Mike Lang	430938dc49	error is always a string, it just might be empty	3 months ago
Mike Lang	f8d3eb7f00	wip:	3 months ago
Mike Lang	ee4a68af50	clear up confusion with empty string vs None	3 months ago
Mike Lang	3e873ca5f6	wip: fixes	3 months ago
Mike Lang	eebfa5885b	sheetsync: pass in event id instead of event name	3 months ago
Mike Lang	cf41f572f5	Fix streamlog formatting	3 months ago
Mike Lang	986a1db964	sheetsync: Change how options are specified to allow multiple backends / syncs	3 months ago
Mike Lang	74869de89d	Implement reverse sync mode This is a mode where all data flows one-way from the database to the sheet. It is intended to be used to populate an empty sheet from database events, possibly sourced from somewhere else. To make this work, a few changes were required: * Track which ids we've seen so we know what events were not matched with a row * Allow `row` to be None in sync_rows * When it is, call the middleware to create a new row with a new id * In sheets, this is implemented by tracking the last empty rows we saw, and claiming them as needed.	3 months ago
Mike Lang	85de9757f7	sheetsync: Remove pick_worksheets() from middleware api Instead, get_rows() makes that decision internally if needed.	3 months ago
Mike Lang	17463d70fe	sheetsync: Remove worksheet from middleware apis since it's now baked into the row dict	3 months ago
Mike Lang	eec58f2651	sheetsync: Always have sheet name as part of row dict	3 months ago
Mike Lang	fa9a4b70bb	bugfix	3 months ago
Mike Lang	ca3f92c0b6	sheetsync: Use streamlog section instead of deriving day from start time	3 months ago
Mike Lang	071cd29f4d	sheetsync: Implement Streamlog middleware	3 months ago
Mike Lang	d064522d60	sheetsync: Move edit url management into Sheets middleware As streamlog doesn't require it.	3 months ago
Mike Lang	be111ccb2a	Change database primary key from UUID to TEXT We still store uuids, but in text form. This allows us to store non-UUID ids for systems that have other ids.	3 months ago
Mike Lang	72f7c59a77	Sheetsync: Split into the main loop logic + sheets-specific middleware NOTE ON CONFLICTS In master, we moved sheets.py to common as it only contained a generic client. Now sheets.py also contains specific sheetsync stuff. Our resolution: - Keep the generic version in common - Keep the old version verbatim (including the now-redundant generic client) in sheetsync We will move the sheetsync implementation to the generic client after the rebase is complete.	3 months ago
Mike Lang	0e5bf1a0fe	sheetsync: Split playlist runloop from normal sheets	3 months ago
Mike Lang	a16259e892	sheetsync: Move id allocation out of sync_row()	3 months ago
Mike Lang	256e0f7ba1	sheetsync: Move row_index variable into row dict	3 months ago
Mike Lang	c5c9075f9e	Basic streamlog api	3 months ago
Mike Lang	c2d2f4b85c	Revert "sheetsync: Support archive sheet" This reverts commit `b93597c274`.	3 months ago
Mike Lang	4c87ad6735	Revert "sheetsync: unmapped columns aren't a problem." This reverts commit `5256577d00`.	3 months ago
ZeldaZach	8bbc72184c	Support hot reload of Zulip Schedule - Move sheets API into common dir, since multi use - Live download from Google Sheets using Config - Falls back on old schedule if new one can't be downloaded for some reason	4 months ago
Mike Lang	3606fadaa8	Pin gevent version to work around build issues Seeing the following error on latest versions of gevent: Traceback (most recent call last): File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/usr/lib/python3.9/site-packages/zulip_bots/schedulebot.py", line 2, in <module> import gevent.monkey File "/usr/lib/python3.9/site-packages/gevent/__init__.py", line 72, in <module> from gevent._hub_local import get_hub File "/usr/lib/python3.9/site-packages/gevent/_hub_local.py", line 150, in <module> import_c_accel(globals(), 'gevent.__hub_local') File "/usr/lib/python3.9/site-packages/gevent/_util.py", line 148, in import_c_accel mod = importlib.import_module(cname) File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) ModuleNotFoundError: No module named 'gevent._gevent_c_hub_local'	1 year ago
Mike Lang	78c053000e	Upgrade pip in order to make wheels work	1 year ago
Mike Lang	5256577d00	sheetsync: unmapped columns aren't a problem.	1 year ago
Mike Lang	b93597c274	sheetsync: Support archive sheet	1 year ago
Mike Lang	044dfb8084	Pin argh to avoid stupid breaking changes	1 year ago
Mike Lang	30d5ccc483	Fix all old references to github.com/ekimekim/wubloader	1 year ago
Mike Lang	1596feef1f	sheetsync: Treat end time "--" as same as start time This is a common idiom, which we previously treated like a blank end time (no end time set yet) but it makes more sense to treat as "same as start".	1 year ago
Mike Lang	92ea0fbb77	sheetsync: even more hard-coded columns in database fetch	2 years ago
Mike Lang	29e6b9ead3	lists aren't sets	2 years ago
Mike Lang	546572a697	sheetsync: Don't pull the entire row from the database only the columns you need. This matters because the thumbnail columns are very large and we're transfering GB of data every time.	2 years ago
Mike Lang	db843c8f63	sheetsync: Report sync duration	2 years ago
Mike Lang	7dfb7b2544	sheetsync: Fix a bug where only show-in-description playlists were detected Because a blank 5th column would make sheetsync ignore the row.	2 years ago
Mike Lang	dd8385ccd8	sheetsync: Special case "<all>" in playlist tags to mean [] this avoids empty string meaning [] which is dangerous since it's easy to write accidentially.	2 years ago
Mike Lang	e7d1212085	fix typo	2 years ago
Mike Lang	32c72d6eb7	sheetsync: correct parsing for updated playlists	2 years ago
Mike Lang	34a33fdeb6	partially implement playlist links in video descriptions We make them conceptually "part of the footer" so they're updated only when the video is otherwise updated (which would generally mean MODIFIED).	2 years ago
Mike Lang	36017aaccd	sheetsync: Show unlisted videos in DONE state as UNLISTED instead We don't actually want to represent them as a different state in the backend, but showing them differently on the sheet is helpful to humans.	2 years ago
Mike Lang	467edf3d19	Read dynamic playlist manager config from sheet The sheetsync loads playlist ids and tags into a new table `playlists`. playlist manager reads this table and merges it with the playlists given on the command line.	3 years ago
Mike Lang	a47c29fff4	Link images to github repo by adding a LABEL When pushed, this tells github to associate the ghcr.io repo that was pushed to with the github repo specified (the owner needs to match). This does a few things. Most importantly, this automatically gives github actions credentials to push to these repositories when run in the context of the wubloader repo.	3 years ago
Mike Lang	aab8cf2f0f	Set up plumbing for multi-range videos and implement no-transition fast cut videos only This is the simplest case as we can just cut each range like we already do, then concat the results. We still allow for the full design in the database and cutter, but error out if transitions is ever anything but hard cuts or if it's a full cut. We also update the restreamer to allow accepting ranges, however for usability we still allow the old "just one start and end" args. Note this changes the thrimshim API to give and take the new "video_ranges" and "video_transitions" columns.	3 years ago
Mike Lang	62bd6539ea	Unpin gevent as that was a workaround for a py2 issue	3 years ago
Mike Lang	f2a8007bf7	Fix build dependency issues	3 years ago
Mike Lang	8f24c2eae1	py3 fixes for sheetsync	3 years ago
HubbeKing	6d790a1b36	Do a first naive pass for py3 compatibility Check that open() calls for reading and writing use binary modes Use alpine version with py3-pip package Use python3 in Dockerfile CMD Remove sys.setdefaultencoding() "hack" Simplify ensure_directory() in common.common package	3 years ago
Mike Lang	f0546e2ee3	Pin gevent to 1.5a2 to avoid https://github.com/gevent/gevent/issues/1711	3 years ago
Mike Lang	b53fcd65a0	Add dependencies required to install psycopg2 from source We can't install the binaries as they don't support musl	4 years ago
HubbeKing	86f7823348	Replace calls to gevent.signal() with gevent.signal_handler() gevent.signal() was removed in gevent 1.5a4, see http://www.gevent.org/api/gevent.signal.html Removed on Feb 5th, see https://github.com/gevent/gevent/pull/1530	4 years ago
Mike Lang	a53786dc2d	Add file and make as build dependencies gevent now requires these to build. I'm not sure when this changed.	4 years ago
Mike Lang	b9cd76b1a2	Add non-static implict tags in sheetsync In order for the upcoming playlist manager to be able to use the DB `tags` column to know what tags a video has, all the tags it needs need to be present. Previously, this was a problem because the day and category tags only get added at the cutter and so wouldn't be listed. This moves them so they are added when parsing the row in sheetsync. It also adds the poster moment tag if poster moment is checked. Note that fully static tags that go on all videos are still only added in cutter, but the playlist manager doesn't need to care about those (since by definition they will match every video).	4 years ago
Mike Lang	29571fb60b	Add tags column to sheetsync New tags column shunts all columns after it right by 1. Note we parse tags by splitting on commas then discarding whitespace. If this would create an empty string tag, it is ignored. Example: "foo, bar baz,a,,bc " -> ["foo", "bar baz", "a", "bc"]	4 years ago
Mike Lang	b85296a81e	sheetsync: Move column indexes to match updated sheet New tags column shunts all columns after it right by 1. We will later want to parse that, but for now we ignore it.	4 years ago
Mike Lang	59d0fa3e40	sheetsync: Don't mis-parse blank as bad time	5 years ago
Mike Lang	ab157afe20	sheetsync: Clear event counts before each update Otherwise, no count of 0 ever gets set, and things are left showing values when they shouldn't.	5 years ago
Mike Lang	89a9e5554c	sheetsync: Record counts of rows in the DB, segmented by various columns This lets us view a number of useful graphs in dashboards, eg. rows by state, errored rows, rows by day, rows by category, meltdowns per day, fraction of events that are poster moments by category. Sheetsync was the natural place to do this since it was already periodically scanning the entire events table.	5 years ago
Mike Lang	1c0f3a627b	sheetsync: Log what worksheets got synced it's kinda important	5 years ago
Mike Lang	8b25f8be95	sheetsync: Inject an error into the error column if we fail to parse an input column	5 years ago
Mike Lang	8dc7b80de9	sheetsync: Improve timing of main loop Instead of always waiting 5 seconds between runs, wait until 5 seconds after the previous run started. This ensures we actually run every 5sec and not every 5sec + how long it takes to run	5 years ago
Mike Lang	cda8078f64	sheetsync: Only check the most recently changed two sheets most times Only check the other sheets every 4th time (20sec instead of 5sec). This elminiates a huge source of unnessecary reads, which prevents us from going over our API limit.	5 years ago
Mike Lang	4f6f4cad8b	sheetsync: Fix typos with metrics	5 years ago
Mike Lang	731ef9e2d0	Refactor dockerfiles for more shared layers By carefully ensuring most of our dockerfiles are identical in their first few layers, we only need to build those layers once instead of every time. In particular, we move installing gevent to before installing common, so that even when common changes gevent doesn't need to be reinstalled. This is important because gevent takes ages to install. Also fixes segment_coverage, which wasn't being installed.	5 years ago
Mike Lang	c740090c53	sheetsync: Add more metrics	5 years ago
Mike Lang	52e6c4ad41	sheetsync, cutter: Collect metrics on http calls In particular, to google apis.	5 years ago
Mike Lang	17af1c4e89	cutter, sheetsync: Wait for DB to connect on startup This is a nicer error than crashing in the depths of some error handler (which is what will happen if the DB goes unavailable while they're running), and it's a far more common case (eg. the DB is misconfigured) than having it fail halfway through. Neither of these services can do anything meaningful without the DB, so crashing without it is acceptable behaviour.	5 years ago
Mike Lang	48593e2b06	database, sheetsync: Add worksheet name column 'sheet_name' This tells us which sheet a row came from (so we don't need to scan every sheet to find it if we're trying to do lookups in that direction). It is also needed in order to tag the videos with the Day number.	5 years ago
Christopher Usher	f9ce41ef32	metrics for the sheetsync	5 years ago
Christopher Usher	a2e47b98f5	fixes -- in dates and the lack of the preshow	5 years ago
Christopher Usher	027c2900e2	fixes in response to ekim's comments	5 years ago
Christopher Usher	1dbe585837	retry database connection if it fails	5 years ago
Mike Lang	f7b591e78b	sheetsync: Log more information on HTTPError The api gives additional detail that we want to know when debugging.	5 years ago
Mike Lang	fe68e98804	sheetsync: Fix a failure mode where we never recover from a DB conn failure Since we never got a new conn after failure, we would just keep erroring with "connection already closed" errors.	6 years ago
Mike Lang	018e920808	sheet-sync: Some fixes	6 years ago
Mike Lang	f354130434	sheetsync: Only allocate ids when first needed This prevents rate limiting issues when immediately allocating all 999 ids for an empty sheet.	6 years ago
Mike Lang	11fc67f071	sheetsync: Review feedback * Expand on some comments * Fix conflicting port number * Write help text for all args	6 years ago
Mike Lang	9762f308a0	Implement main part of sheet sync	6 years ago
Mike Lang	5a44bfdf51	Google sheets api wrapper Exposes a way to read all rows, and write a single cell. We need to read all columns of each row so we know what would be modified so we only do updates to single cells that aren't already the correct value. This keeps us from impacting the sheet load too much with constantly changing values, which I think might be a thing even if the values are the same.	6 years ago
Mike Lang	2b4d2cce90	sheet sync: Basic skeleton	6 years ago

1 2 3

136 Commits (1320472d05167fce5140ab5e139b391b8a1bc3df)