Commit Graph

68 Commits (9e6cd71026a8262cf7df0fde8b9496d5ae66f945)

Author SHA1 Message Date
Mike Lang a7f5d1c545 Fix issues with metrics gathering for cut functions
* Need to allow timed() to have multiple callers with same name
* "type" label is reserved, use "cut_type" instead
5 years ago
Mike Lang eb4fb5a9e1 restreamer: Add more options for fetching cuts
Split full cut into two types - an mpegts one and an mp4 one.
Add "rough" cut which is just a concat of the segments.
5 years ago
Mike Lang b27e06d068 Fix typo in common/common/segments.py 5 years ago
Mike Lang 4d52b18b04 cutter,restreamer: Set stream=True for full cuts when appropriate
And also default to a new ffmpeg encoding setting for high-quality mpegts
(ie. still streamable) that is encoded very quickly.
5 years ago
Mike Lang 9afcc7b399 full cut: Optionally use seekable file OR directly stream
The caller can pick depending on the needs of the output format.

This reverses most of 80d829b83b,
re-introducing streaming full cuts but keeping non-streaming as an option.
5 years ago
Mike Lang 4f900c5925 Collect metrics around cutting time 5 years ago
Mike Lang 52e6c4ad41 sheetsync, cutter: Collect metrics on http calls
In particular, to google apis.
5 years ago
Mike Lang a2edb38a85 Add an InstrumentedSession wrapper that automatically gathers metrics on http calls 5 years ago
Mike Lang fc791e03d4 DBManager: Don't test connection on start
This gives the individual services more freedom in how to handle
a failing connection.
5 years ago
Mike Lang e435abf72e
Merge pull request #114 from ekimekim/mike/fixes
Grab-bag of cutter fixes
5 years ago
Mike Lang 80d829b83b full cut: ffmpeg requires a seekable output file
Most formats like mp4 require ffmpeg to make changes at the start of the file
throughout writing.

Unfortunately, this prevents us from streaming the upload as we cut it.

Instead, we spool to a temporary file until ffmpeg exits,
then upload that all at once.
5 years ago
Christopher Usher f4cd3f546e removed comments no longer needed 5 years ago
Christopher Usher 51e4520826 replaced warnings.warn with logger.warn 5 years ago
Christopher Usher 7d85eb7272 warn about and ignore files that don't parse as segments 5 years ago
Mike Lang 3a9543a4b5 Suppress less ffmpeg output when cutting
The "fatal" level was causing some useful errors to be suppressed.
5 years ago
Mike Lang 12decf015e Fix multiple typos and mistakes with full cuts 5 years ago
Mike Lang c970677a76
full cut: Fix a typo 5 years ago
Mike Lang d3e1d6b4fc Resurrect non-experimental cut, now dubbed "full" (vs "fast") cut
In a fast cut, we edit the first and last segments then concatenate them all.
However, this leads to some tiny but perciptible artifacting around the border
of the first and second (and second-last and last) segments.

A full cut is much slower, but re-encodes the video into the desired format
and is more reliable.

We want both options to be available.

With this commit, we only add the option, we don't use it in restreamer or cutter.
5 years ago
Christopher Usher 928f9733d2 horrible bug with negative times fixed 5 years ago
Christopher Usher a6303c38ce fixed parse_segment_path to allow just a filename to be parsed 5 years ago
Christopher Usher 632c5fae2f added a default timeout to database connections 5 years ago
Christopher Usher ff5c1f8ecd fixes based on ekim's suggestions 5 years ago
Christopher Usher f75f3e61e8 Removed schema from common/database.py 5 years ago
Christopher Usher 86477fae13 fixes for ekim's comments 5 years ago
Christopher Usher 23e3cfce20 Added editor, edit_time and upload_time to thrimshim and cutter updates of the
database
5 years ago
Christopher Usher 75cafdabb7 database changes to keep track of editors and edit times 5 years ago
Christopher Usher 67100c4126 comments 5 years ago
Christopher Usher 76bc629720 moved flask monitoring to its own module 5 years ago
Christopher Usher 6c633df3ee move restreamer.stats to common.stats 5 years ago
Christopher Usher b959853593 refactored to channel and quality 5 years ago
Mike Lang 7179fcacec Backfiller: ignore temp segments
To make this work, we make type a proper segment field.

We also tell get_best_segments to ignore temp segments, since they might go away
before we can actually use them.
6 years ago
Mike Lang 499e486b0b
Merge pull request #54 from ekimekim/mike/sheet-sync/initial
sheet sync
6 years ago
Christopher Usher 4b9fbcb7d2 backfiller database code 6 years ago
Mike Lang 9762f308a0 Implement main part of sheet sync 6 years ago
Mike Lang 3647d091f8 Move common google api auth functionality into common
So we can reuse it for google sheets
6 years ago
Mike Lang 3ccace2a73 database: Update constraints to allow null edit inputs in state DONE
This allows manual uploads to work without needing to fill all the edit fields
with junk.

We also set a constraint on uploader asserting that any videos from claimed onwards have a known uploader.
Again, an exception is made for DONE to allow manual uploads.
6 years ago
Mike Lang cca4d52b7d Don't error when encountering a temp-type segment
These can happen if a downloader or backfiller dies suddenly.
We treat it similarly to partial but lacking any hash.

At some point in the future we should probably have something
to find any temp segments, hash them and rename them to partials.
6 years ago
Mike Lang f8d10dacdf Audit and fix all usage of dateutil
We wrap direct dateutil calls to handle two distinct cases:

* `common.dateutil.parse()`: We want to handle arbitrary timestamps including tz info,
then convert them to UTC.

This is used in HLS parsing, and for command line input for backfiller

* `common.dateutil.parse_utc_only()`: We want to only handle UTC timestamps,
but datetime.strptime isn't flexible enough (eg. can't handle missing fractional component).

This is used for restreamer request params.
6 years ago
Mike Lang dfc64481a6 Port existing cutting code from restreamer into common
Note this moves over the 'experimental' cutter and deletes the original cutter
that concatenates entire videos before cutting.
We may eventually want to revive that method if the experimental cutter turns out
to introduce too many issues.

We move most of the code over verbatim, but adjust it such that it acts
as a generic iterator that can be used in a variety of contexts.

Some other changes made during the move include telling ffmpeg to be quieter
(don't output version info and junk, only log if something goes wrong),
and avoiding errors during cleanup.
6 years ago
Mike Lang 3d9ba77745 common: add allow_holes option to get_best_segments() to abort early if holes found
This is a performance optimization, allowing us to fail out early (potentially avoiding a LOT
of work) if we know we're going to reject any result that contains holes.

We add a new exception ContainsHoles that is raised in this condition.
6 years ago
Mike Lang e383613954 database: Add constraints on edit inputs that they must be non-NULL if state != UNEDITED
This should help prevent changing state to EDITED with any of these fields unset,
which would blow up the cutter.

We also fix up upload_location, which was set up as a sheet input (NOT NULL DEFAULT ''),
and add a similar constraint saying any DONE columns must have non-NULL video link.
6 years ago
Mike Lang 292188ad7c database: Remove retry_on_conflict helper and default to autocommit
All our usage was of a single query anyway, so autocommit is easier to handle.
You can still opt into a longer transaction using the transaction() helper.
6 years ago
Mike Lang 73640ed4ab database: Add column video_id for storing upload-location-specific metadata for identifying video
ie. for youtube, the video id.
6 years ago
Mike Lang dc2eb6ed74 Add some common database code
This code manages the database connections, setting their isolation level correctly
and ensuring the idempotent schema is applied before they're used.

Applying the schema on startup means we don't need to deal with the database's state,
setting it up before running, running migrations etc. However, it does put constraints on
the changes we can safely make.

Our use of seralizable isolation means that all transactions can be treated as fully
independent - the server must behave as though they'd been run seperately in some valid order.
This will give us the least surprising results when multiple connections try to modify the same
data, though we'll need to deal with occasional transaction commit failures due to conficts.
6 years ago
Mike Lang 997c1242b2 get_best_segments: Let other things run
get_best_segments can sometimes take a very long time,
we don't want to stop other work from happening while it's ongoing.
So we ask gevent to run other things until there's no other work to do,
then we do one hour, then check back with gevent again.

In combination with the performance improvements, this should mean we don't block
other things from running for more than a few hundred ms at most.
6 years ago
Mike Lang bf08aa29b8 parse_segment_path: Use datetime.strptime instead of dateutil.parser
strptime is much faster but can't handle as varied formats.
But in this case we fully control the format, so there's no reason not to use it.

Profiling suggests we spend about 80% of our time in get_best_segments just parsing dates,
so this is a signifigant performance gain.
6 years ago
Mike Lang bcdb268ce8 Also need to replace locks on the counter float values to prevent deadlocks
See comment for full details
6 years ago
Mike Lang 10cca18922 Fix a deadlock due to signal interactions with prometheus client
The prometheus client uses a threading.Lock() to prevent shared access to
certain metric state. This lock is taken as part of doing collection, as well
as during metric.labels().

We hit a deadlock where our stack sampler signal arrived during a collection,
when the lock was held. This meant that flamegraph.labels() blocked forever,
and the lock was never released, hanging all metrics collection.

Our solution is a hack, which is to reach into the internals of our metric object
and replace its lock with a dummy one. This is reasonably safe, but only as long as
the prometheus_client internal structure doesn't change signfigiantly.
6 years ago
Mike Lang b9c2921242 common.stats: Add a stacksampler that records sampled stacks to prometheus
This can then be used to generate flamegraphs
6 years ago
Mike Lang 5175b099af common: Split segment-related stuff into its own module
We still import them into __init__.py so they're accessible externally just the same
6 years ago