You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
wubloader/DATABASE.md

203 lines
19 KiB
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

Info on the database schema and interactions with services
### The event table
One *event* is analogous to one entry in the main VST spreadsheet.
It acts as both the canonical record of the sheet,
and as state for the event as it is processed by the cutting system.
In general, columns are either:
* Taken directly from the sheet as input (*sheet inputs*)
* Set by the editor as input (*edit inputs*)
* Set by the cutter in the process of cutting to record *state* and *outputs*.
#### The state machine
The most important column is `state`. This is an enum of several possible values
which encode the overall position this event is at along the process of editing and cutting a video.
The possible states are:
* `UNEDITED`: An event which is not ready to be processed. This is the initial state of all events,
but it can also result from an event which has been edited, but then processing was aborted due
to an error or cancelled by an operator. In these cases the `error` column will be populated
with an error message.
* `EDITED`: An event for which edit inputs have been submitted and cutting is ready to proceed.
Only `UNEDITED` rows will become `EDITED` - rows which have already been edited but not yet
uploaded should instead be cancelled so they return to `UNEDITED` before attempting to re-edit them.
* `CLAIMED`: An event which is in the process of being cut and uploaded. `CLAIMED` events will
have a populated `uploader` column which indicates the cutter which is doing the cutting.
* `FINALIZING`: An event whose upload has finished, but we are currently in the process of
finalizing the upload to make it official. If a cutter dies and leaves an event in this state,
it is indeterminate whether the upload actually occurred - in this unlikely scenario, an operator
should manually inspect things and decide on further action.
* `TRANSCODING`: An event which has been succesfully uploaded, but is not yet ready for public consumption.
The upload is no longer cancellable. If further re-edits need to be applied,
an operator should manually delete or unlist the video then set the state back to `UNEDITED`.
In youtube terms, this covers the period after upload while transcoding is happening and the video
is not yet able to be played (or only at reduced resolution).
* `DONE`: An event whose video is ready for public consumption. As with `TRANSCODING`, if changes need
to be made, an operator should manually delete or unlist the video then set the state back
to `UNEDITED`, or modify the video if possible (see `MODIFIED`).
* `MODIFIED`: An event that was previously successfully uploaded, which has had some of its edit inputs
modified. Cutters will see this state and attempt to edit the video to match the new edit inputs,
though the possible edits depend on the upload backend. This only includes edits to metadata fields
like title, and should not require re-cutting the video. Once updated, the cutter returns the video to `DONE`.
The following transitions are possible:
* `UNEDITED -> EDITED`: When a video is edited and edit inputs are submitted
* `EDITED -> CLAIMED`: When a video is claimed by a cutter and it begins cutting it.
* `EDITED -> UNEDITED`: When an operator cancels an edited video before any cutter claims it.
* `CLAIMED -> EDITED`: When the cutting process is interrupted, eg. because the cutter crashed,
or a recoverable error occurred, but there is nothing wrong with the event
and it can be immediately retried.
* `CLAIMED -> UNEDITED`: When an operator cancels a claimed video before cutting is complete.
* `CLAIMED -> UNEDITED`: When the cutting process failed with an unknown error,
and operator intervention is required. `error` will be populated.
* `CLAIMED -> FINALIZING`: When the cutting process is finished, immediately before the cutter
finalizes the upload.
* `FINALIZING -> EDITED`: When the finalization failed due to a recoverable reason,
we are certain the upload didn't actually go through, and the cut can be immediately retried
* `FINALIZING -> UNEDITED`: When the finalization failed with an unknown error,
we are certain the upload didn't actually go through, and operator intervention is required.
* `FINALIZING -> TRANSCODING`: When the cutter has successfully finalized the upload,
but the upload location requires further processing before the video is done.
* `FINALIZING -> DONE`: When the cutter has successfully finalized the upload,
and the upload location requires no further processing.
* `TRANSCODING -> DONE`: When any cutter detects that the upload location is finished
transcoding the video, and it is ready for public consumption.
* `DONE -> MODIFIED`: When an operator modifies an uploaded video
* `MODIFIED -> DONE`: When a cutter successfully updates a modified video, or when
an operator cancels the modification (leaving the video in an indeterminate state,
which the operator is responsible for verifying).
This is summarised in the below graph:
```
retry ┌──────────┐
┌───────────────────────────────────────────────┐ │ MODIFIED │
│ │ └──────────┘
cancel │ │ ∧ │
┌──────────────────────┼───────────────────┐ │ modify │ │ updated
│ │ │
┌──────────┐ edit ┌────────┐ claim ┌─────────┐ pre-finalize ┌────────────┐ post-finalize ┌─────────────┐ when ready ┌──────┐
│ │ ────────> │ │ ───────> │ │ ──────────────> │ │ ───────────────> │ TRANSCODING │ ────────────> │ DONE │
│ │ │ │ │ │ │ │ └─────────────┘ └──────┘
│ │ cancel │ │ retry │ │ │ │ post-finalize ∧
│ UNEDITED │ <──────── │ EDITED │ <─────── │ CLAIMED │ │ FINALIZING │ ─────────────────────────────────────────────────┘
│ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │
┌─────> │ │ │ │ ┌────── │ │ │ │
│ └──────────┘ └────────┘ │ └─────────┘ └────────────┘
│ ∧ error │ │
│ error └────────────────────────────────┘ │
│ │
│ │
└────────────────────────────────────────────────────────────────────────────────┘
```
#### Thumbnails
The state around thumbnails is a little complicated.
The `thumbnail_mode` is set by the editor and has the following options:
* `NONE`: Video should not have a thumbnail uploaded.
This will not delete an existing thumbnail if present.
* `BARE`: Video thumbnail is a still frame taken from the stream at `thumbnail_time`.
`thumbnail_time` must not be NULL. `thumbnail_template` must be NULL.
* `TEMPLATE`: Video thumbnail takes a still frame from the stream at `thumbnail_time` and
combines it with a template image with name `thumbnail_template`. Both these columns must
not be NULL.
* `CUSTOM`: Video thumbnail is a custom image stored in `thumbnail_image`, which must not be NULL.
In the cases of `BARE` and `TEMPLATE`, `thumbnail_image` is used to store the generated image.
This generation happens when the video is uploaded.
However, if the `thumbnail_image` column is later set to NULL and state set to `MODIFIED`,
the image will be re-generated before the video is modified.
Unused columns for the current mode are allowed to be non-NULL, this allows for changing
the mode then changing it back, without losing the old saved settings.
All the above columns are modifiable, within the constraints outlined above.
The mode column's default is currently `TEMPLATE`, but this is just a UX choice.
Finally, the `thumbnail_last_written` column holds a SHA256 hash of the image data most recently
uploaded. This allows us to detect if it has changed when modifying a video.
We could query the current thumbnail from youtube's API, but this may be re-encoded or scaled
and not have exactly the same content.
#### Full schema
The details below assume postgres, but nothing is signifigantly different in any SQL DB,
except the use of arrays which would need to be split out into another table, but even that is
a straightforward change.
Note that most of the sheet input string types are `NOT NULL DEFAULT ''`, as when taking sheet inputs,
there is no meaningful distinction between "unset" and "set to empty string".
However, for other sheet inputs, a NULL is used to indicate unset / an unparsable value.
Edit input values are initially NULL, but must not be NULL once the state is no longer `UNEDITED`.
columns | type | role | description
-------------------------- | ---------------------------------- | :---------: | -----------
`id` | `TEXT PRIMARY KEY` | sheet input | Generated and attached to rows in the sheet to uniquely identify them even in the face of added, deleted or moved rows.
`sheet_name` | `TEXT NOT NULL` | sheet input | The name of the worksheet that the row is on. This is used to tag videos, and can be used to narrow down the range to look for an id in for more efficient lookup (though we never do that right now).
`event_start`, `event_end` | `TIMESTAMP` | sheet input | Start and end time of the event. Parsed from the sheet into timestamps or NULL. Used to set the editor time span, and displayed on the public sheet. The start time also determines what "day" the event lies on, for video tagging and other purposes.
`category` | `TEXT NOT NULL DEFAULT ''` | sheet input | The kind of event. By convention selected from a small list of categories, but stored as an arbitrary string because there's little to no benefit to using an enum here, it just makes our job harder when adding a new category. Used to tag videos, and for display on the public sheet.
`description` | `TEXT NOT NULL DEFAULT ''` | sheet input | Event description. Provides the default title and description for editors, and displayed on the public sheet.
`submitter_winner` | `TEXT NOT NULL DEFAULT ''` | sheet input | A column detailing challenge submitter, auction winner, or other "associated person" data. This shouldn't be relied on in any processing but should be displayed on the public sheet.
`poster_moment` | `BOOLEAN NOT NULL DEFAULT FALSE` | sheet input | Whether or not the event was featured on the poster. Used for building the postermap and also displayed on the public sheet.
`image_links` | `TEXT[] NOT NULL` | sheet input | Any additional gif or image links associated with the event. Displayed on the public sheet.
`notes` | `TEXT NOT NULL DEFAULT ''` | sheet input | Private notes on this event, used eg. to leave messages or special instructions. Displayed to the editor during editing, but otherwise unused.
`tags` | `TEXT[] NOT NULL` | sheet input | Custom tags to annotate this event's video with. Provides the default tags that the editor can then adjust.
`allow_holes` | `BOOLEAN NOT NULL DEFAULT FALSE` | edit input | If false, any missing segments encountered while cutting will cause the cut to fail. Setting this to true should be done by an operator to indicate that holes are expected in this range. It is also the operator's responsibility to ensure that all allowed cutters have all segments that they can get, since there is no guarentee that only the cutter with the least missing segments will get the cut job.
`uploader_whitelist` | `TEXT[]` | edit input | List of uploaders which are allowed to cut this entry, or NULL to indicate no restriction. This is useful if you are allowing holes and the amount of missing data differs between nodes (this shouldn't happen - this would mean replication is also failing), or if an operator is investigating a problem with a specific node.
`upload_location` | `TEXT` | edit input | The upload location to upload the cut video to. This is used by the cutter, and must match one of the cutter's configured upload locations. If it does not, the cutter will not claim the event.
`public` | `BOOLEAN NOT NULL DEFAULT TRUE` | edit input | Whether the uploaded video should be public or not, if the upload location supports that distinction. For example, on youtube, non-public videos are "unlisted". It also controls whether the video will be added to playlists, only public videos are added to playlists.
`video_ranges` | `{start TIMESTAMP, end TIMESTAMP}[]` | edit input | A non-zero number of start and end times, describing the ranges of video to cut. They will be cut back-to-back in the given order, with the transitions between as per `video_transitions`. If already set, used as the default range settings when editing.
`video_transitions` | `{type TEXT, duration DOUBLE PRECISION}[]`| edit input | Defines how to transition between each range defined in `video_ranges`, and must be exactly the length of `video_ranges` minus 1. Each index in `video_transitions` defines the transition between the range with the same index in `video_ranges` and the next one. Transitions either specify a transition type as understood by `ffmpeg`'s `xfade` filter and a duration (amount of overlap), or can be NULL to indicate a hard cut.
`video_title` | `TEXT` | edit input | The title of the video. If already set, used as the default title when editing instead of `description`.
`video_description` | `TEXT` | edit input | The description field of the video. If already set, used as the default description when editing instead of `description`.
`video_tags` | `TEXT[]` | edit input | Custom tags to annotate the video with. If already set, used as the default when editing instead of `tags`.
`video_channel` | `TEXT` | edit input | The twitch channel to cut the video from. If already set, used as the default channel selection when editing, instead of a pre-configured editor default. While this will almost always be the default value, it's a useful thing to be able to change should the need arise.
`video_quality` | `TEXT NOT NULL DEFAULT 'source'` | edit input | The stream quality to cut the video from. Used as the default quality selection when editing. While this will almost always be the default value, it's a useful thing to be able to change should the need arise.
`thumbnail_mode` | `ENUM NOT NULL DEFAULT 'TEMPLATE'` | edit input | The thumbnail mode. See "Thumbnails" above.
`thumbnail_time` | `TIMESTAMP` | edit input | The video time to grab a frame from for the thumbnail in BARE and TEMPLATE modes.
`thumbnail_template` | `TEXT` | edit input | The template name to use for the thumbnail in TEMPLATE mode.
`thumbnail_image` | `BYTEA` | edit input | In CUSTOM mode, the thumbnail image. In BARE and TEMPLATE modes, the generated thumbnail image, or NULL to indicate it should be generated when next needed.
`state` | `ENUM NOT NULL DEFAULT 'UNEDITED'` | state | See "The state machine" above.
`uploader` | `TEXT` | state | The name of the cutter node performing the cut and upload. Set when transitioning from `EDITED` to `CLAIMED` and cleared on a retryable error. Left uncleared on non-retryable errors to provide information to the operator. Cleared on a re-edit if set.
`error` | `TEXT` | state | A human-readable error message, set if a non-retryable error occurs. Its presence indicates operator intervention is required. Cleared on a re-edit if set.
`video_id` | `TEXT` | state | An id that can be used to refer to the video to check if transcoding is complete. Often the video_link can be generated from this, but not nessecarily.
`video_link` | `TEXT` | output | A link to the uploaded video. Only set when state is `TRANSCODING` or `DONE`.
`editor` | `TEXT` | state | Email address of the last editor; corresponds to an entry in the `editors` table. Only set when state is not `UNEDITED`.
`edit_time` | `TIMESTAMP` | state | Time of the last edit. Only set when state is not `UNEDITED`.
`upload_time` | `TIMESTAMP` | state | Time when video state is set to `DONE`. Only set when state is `DONE`.
`last_modified` | `TIMESTAMP` | state | Time when video state was last set to `MODIFIED`, or NULL if it has never been. Only used for diagnostics.
`thumbnail_last_written` | `BYTEA` | state | The SHA256 hash, in binary form, of the most recently uploaded thumbnail image.