Commit Graph

49 Commits (3ddbbfd31dff343080036f0df49cda1847ea392a)

Author SHA1 Message Date
Mike Lang 92ea0fbb77 sheetsync: even more hard-coded columns in database fetch 2 years ago
Mike Lang 29e6b9ead3 lists aren't sets 2 years ago
Mike Lang 546572a697 sheetsync: Don't pull the entire row from the database
only the columns you need.

This matters because the thumbnail columns are very large and
we're transfering GB of data every time.
2 years ago
Mike Lang db843c8f63 sheetsync: Report sync duration 2 years ago
Mike Lang 7dfb7b2544 sheetsync: Fix a bug where only show-in-description playlists were detected
Because a blank 5th column would make sheetsync ignore the row.
2 years ago
Mike Lang dd8385ccd8 sheetsync: Special case "<all>" in playlist tags to mean []
this avoids empty string meaning [] which is dangerous since it's easy to write accidentially.
2 years ago
Mike Lang e7d1212085 fix typo 2 years ago
Mike Lang 32c72d6eb7 sheetsync: correct parsing for updated playlists 2 years ago
Mike Lang 34a33fdeb6 partially implement playlist links in video descriptions
We make them conceptually "part of the footer" so they're updated only when the video
is otherwise updated (which would generally mean MODIFIED).
2 years ago
Mike Lang 36017aaccd sheetsync: Show unlisted videos in DONE state as UNLISTED instead
We don't actually want to represent them as a different state in the backend, but showing
them differently on the sheet is helpful to humans.
2 years ago
Mike Lang 467edf3d19 Read dynamic playlist manager config from sheet
The sheetsync loads playlist ids and tags into a new table `playlists`.
playlist manager reads this table and merges it with the playlists given on the command line.
3 years ago
Mike Lang a47c29fff4 Link images to github repo by adding a LABEL
When pushed, this tells github to associate the ghcr.io repo that was pushed to
with the github repo specified (the owner needs to match).

This does a few things.
Most importantly, this automatically gives github actions credentials to push to these
repositories when run in the context of the wubloader repo.
3 years ago
Mike Lang aab8cf2f0f Set up plumbing for multi-range videos and implement no-transition fast cut videos only
This is the simplest case as we can just cut each range like we already do,
then concat the results.

We still allow for the full design in the database and cutter, but error out if transitions
is ever anything but hard cuts or if it's a full cut.

We also update the restreamer to allow accepting ranges, however for usability we still allow
the old "just one start and end" args.

Note this changes the thrimshim API to give and take the new "video_ranges" and "video_transitions" columns.
3 years ago
Mike Lang 62bd6539ea Unpin gevent as that was a workaround for a py2 issue 3 years ago
Mike Lang f2a8007bf7 Fix build dependency issues 3 years ago
Mike Lang 8f24c2eae1 py3 fixes for sheetsync 3 years ago
HubbeKing 6d790a1b36 Do a first naive pass for py3 compatibility
Check that open() calls for reading and writing use binary modes
Use alpine version with py3-pip package
Use python3 in Dockerfile CMD
Remove sys.setdefaultencoding() "hack"
Simplify ensure_directory() in common.common package
3 years ago
Mike Lang f0546e2ee3 Pin gevent to 1.5a2 to avoid https://github.com/gevent/gevent/issues/1711 3 years ago
Mike Lang b53fcd65a0 Add dependencies required to install psycopg2 from source
We can't install the binaries as they don't support musl
4 years ago
HubbeKing 86f7823348 Replace calls to gevent.signal() with gevent.signal_handler()
gevent.signal() was removed in gevent 1.5a4, see http://www.gevent.org/api/gevent.signal.html
Removed on Feb 5th, see https://github.com/gevent/gevent/pull/1530
4 years ago
Mike Lang a53786dc2d Add file and make as build dependencies
gevent now requires these to build. I'm not sure when this changed.
4 years ago
Mike Lang b9cd76b1a2 Add non-static implict tags in sheetsync
In order for the upcoming playlist manager to be able to use the DB `tags` column to know
what tags a video has, all the tags it needs need to be present.

Previously, this was a problem because the day and category tags only get added at the cutter
and so wouldn't be listed.

This moves them so they are added when parsing the row in sheetsync.
It also adds the poster moment tag if poster moment is checked.

Note that fully static tags that go on all videos are still only added in cutter,
but the playlist manager doesn't need to care about those (since by definition
they will match every video).
4 years ago
Mike Lang 29571fb60b Add tags column to sheetsync
New tags column shunts all columns after it right by 1.

Note we parse tags by splitting on commas then discarding whitespace.
If this would create an empty string tag, it is ignored.
Example: "foo, bar baz,a,,bc " -> ["foo", "bar baz", "a", "bc"]
4 years ago
Mike Lang b85296a81e sheetsync: Move column indexes to match updated sheet
New tags column shunts all columns after it right by 1.
We will later want to parse that, but for now we ignore it.
4 years ago
Mike Lang 59d0fa3e40 sheetsync: Don't mis-parse blank as bad time 5 years ago
Mike Lang ab157afe20 sheetsync: Clear event counts before each update
Otherwise, no count of 0 ever gets set, and things are left showing
values when they shouldn't.
5 years ago
Mike Lang 89a9e5554c sheetsync: Record counts of rows in the DB, segmented by various columns
This lets us view a number of useful graphs in dashboards, eg. rows by state,
errored rows, rows by day, rows by category, meltdowns per day, fraction of
events that are poster moments by category.

Sheetsync was the natural place to do this since it was already periodically scanning
the entire events table.
5 years ago
Mike Lang 1c0f3a627b sheetsync: Log what worksheets got synced
it's kinda important
5 years ago
Mike Lang 8b25f8be95 sheetsync: Inject an error into the error column if we fail to parse an input column 5 years ago
Mike Lang 8dc7b80de9 sheetsync: Improve timing of main loop
Instead of always waiting 5 seconds between runs,
wait until 5 seconds after the previous run started.

This ensures we actually run every 5sec and not every 5sec + how long it takes to run
5 years ago
Mike Lang cda8078f64 sheetsync: Only check the most recently changed two sheets most times
Only check the other sheets every 4th time (20sec instead of 5sec).

This elminiates a huge source of unnessecary reads, which prevents us from going over
our API limit.
5 years ago
Mike Lang 4f6f4cad8b sheetsync: Fix typos with metrics 5 years ago
Mike Lang 731ef9e2d0 Refactor dockerfiles for more shared layers
By carefully ensuring most of our dockerfiles are identical in their first few layers,
we only need to build those layers once instead of every time.

In particular, we move installing gevent to before installing common,
so that even when common changes gevent doesn't need to be reinstalled.

This is important because gevent takes ages to install.

Also fixes segment_coverage, which wasn't being installed.
5 years ago
Mike Lang c740090c53 sheetsync: Add more metrics 5 years ago
Mike Lang 52e6c4ad41 sheetsync, cutter: Collect metrics on http calls
In particular, to google apis.
5 years ago
Mike Lang 17af1c4e89 cutter, sheetsync: Wait for DB to connect on startup
This is a nicer error than crashing in the depths of some error handler
(which is what will happen if the DB goes unavailable while they're running),
and it's a far more common case (eg. the DB is misconfigured) than having it fail
halfway through.

Neither of these services can do anything meaningful without the DB,
so crashing without it is acceptable behaviour.
5 years ago
Mike Lang 48593e2b06 database, sheetsync: Add worksheet name column 'sheet_name'
This tells us which sheet a row came from
(so we don't need to scan every sheet to find it if we're trying to do
lookups in that direction).

It is also needed in order to tag the videos with the Day number.
5 years ago
Christopher Usher f9ce41ef32 metrics for the sheetsync 5 years ago
Christopher Usher a2e47b98f5 fixes -- in dates and the lack of the preshow 5 years ago
Christopher Usher 027c2900e2 fixes in response to ekim's comments 5 years ago
Christopher Usher 1dbe585837 retry database connection if it fails 5 years ago
Mike Lang f7b591e78b sheetsync: Log more information on HTTPError
The api gives additional detail that we want to know when debugging.
5 years ago
Mike Lang fe68e98804 sheetsync: Fix a failure mode where we never recover from a DB conn failure
Since we never got a new conn after failure, we would just keep erroring with
"connection already closed" errors.
6 years ago
Mike Lang 018e920808 sheet-sync: Some fixes 6 years ago
Mike Lang f354130434 sheetsync: Only allocate ids when first needed
This prevents rate limiting issues when immediately allocating all 999 ids
for an empty sheet.
6 years ago
Mike Lang 11fc67f071 sheetsync: Review feedback
* Expand on some comments
* Fix conflicting port number
* Write help text for all args
6 years ago
Mike Lang 9762f308a0 Implement main part of sheet sync 6 years ago
Mike Lang 5a44bfdf51 Google sheets api wrapper
Exposes a way to read all rows, and write a single cell.

We need to read all columns of each row so we know what would be modified
so we only do updates to single cells that aren't already the correct value.

This keeps us from impacting the sheet load too much with constantly changing values,
which I think might be a thing even if the values are the same.
6 years ago
Mike Lang 2b4d2cce90 sheet sync: Basic skeleton 6 years ago