wubloader/common/common/__init__.py


"""A place for common utilities between wubloader components"""


import datetime
import errno
import os
import random

from .segments import get_best_segments, cut_segments, parse_segment_path, SegmentInfo
from .stats import timed, PromLogCountsHandler, install_stacksampler


def dt_to_bustime(start, dt):
	"""Convert a datetime to bus time. Bus time is seconds since the given start point."""
	return (dt - start).total_seconds()


def bustime_to_dt(start, bustime):
	"""Convert from bus time to a datetime"""
	return start + datetime.timedelta(seconds=bustime)


def parse_bustime(bustime):
	"""Convert from bus time human-readable string [-]HH:MM[:SS[.fff]]
	to float seconds since bustime 00:00. Inverse of format_bustime(),
	see it for detail."""
	if bustime.startswith('-'):
		# parse without the -, then negate it
		return -parse_bustime(bustime[:1])

	parts = bustime.strip().split(':')
	if len(parts) == 2:
		hours, mins = parts
		secs = 0
	elif len(parts) == 3:
		hours, mins, secs = parts
	else:
		raise ValueError("Invalid bustime: must be HH:MM[:SS]")
	hours = int(hours)
	mins = int(mins)
	secs = float(secs)
	return 3600 * hours + 60 * mins + secs


def format_bustime(bustime, round="millisecond"):
	"""Convert bustime to a human-readable string (-)HH:MM:SS.fff, with the
	ending cut off depending on the value of round:
		"millisecond": (default) Round to the nearest millisecond.
		"second": Round down to the current second.
		"minute": Round down to the current minute.
	Examples:
		00:00:00.000
		01:23:00
		110:50
		159:59:59.999
		-10:30:01.100
	Negative times are formatted as time-until-start, preceeded by a minus
	sign.
	eg. "-1:20:00" indicates the run begins in 80 minutes.
	"""
	sign = ''
	if bustime < 0:
		sign = '-'
		bustime = -bustime
	total_mins, secs = divmod(bustime, 60)
	hours, mins = divmod(total_mins, 60)
	parts = [
		"{:02d}".format(int(hours)),
		"{:02d}".format(int(mins)),
	]
	if round == "minute":
		pass
	elif round == "second":
		parts.append("{:02d}".format(int(secs)))
	elif round == "millisecond":
		parts.append("{:06.3f}".format(secs))
	else:
		raise ValueError("Bad rounding value: {!r}".format(round))
	return sign + ":".join(parts)


def rename(old, new):
	"""Atomic rename that succeeds if the target already exists, since we're naming everything
	by hash anyway, so if the filepath already exists the file itself is already there.
	In this case, we delete the source file.
	"""
	try:
		os.rename(old, new)
	except OSError as e:
		if e.errno != errno.EEXIST:
			raise
		os.remove(old)


def ensure_directory(path):
	"""Create directory that contains path, as well as any parent directories,
	if they don't already exist."""
	dir_path = os.path.dirname(path)
	if os.path.exists(dir_path):
		return
	ensure_directory(dir_path)
	try:
		os.mkdir(dir_path)
	except OSError as e:
		# Ignore if EEXISTS. This is needed to avoid a race if two getters run at once.
		if e.errno != errno.EEXIST:
			raise


def jitter(interval):
	"""Apply some 'jitter' to an interval. This is a random +/- 10% change in order to
	smooth out patterns and prevent everything from retrying at the same time.
	"""
	return interval * (0.9 + 0.2 * random.random())


def encode_strings(o):
	"""Recurvisely handles unicode in json output."""
	if isinstance(o, list):
		return [encode_strings(x) for x in o]
	if isinstance(o, dict):
		return {k.encode('utf-8'): encode_strings(v) for k, v in o.items()}
	if isinstance(o, unicode):
		return o.encode('utf-8')
	return o
Add a common package for common bits in multiple components 6 years ago
			`"""A place for common utilities between wubloader components"""`

common: Basic config and bustime code 6 years ago
			`import datetime`
common: Implement code for parsing paths and picking the best sequence of segments This is needed by both the restreamer and the cutter, hence its inclusion in common. The algorithm is pretty simple - it takes the 'best' segment per start time by full first, then length of partial. All the other complexity is mainly just around detecting and reporting holes, and being inclusive of start/end points. 6 years ago			`import errno`
			`import os`
Fix some bugs and linter errors introduced by backfiller I ran `pyflakes` on the repo and found these bugs: ``` ./common/common.py:289: undefined name 'random' ./downloader/downloader/main.py:7: 'random' imported but unused ./backfiller/backfiller/main.py:150: undefined name 'variant' ./backfiller/backfiller/main.py:158: undefined name 'timedelta' ./backfiller/backfiller/main.py:171: undefined name 'sort' ./backfiller/backfiller/main.py:173: undefined name 'sort' ``` (ok, the "imported but unused" one isn't a bug, but the rest are) This fixes those, as well as a further issue I saw with sorting of hours. Iterables are not sortable. As an obvious example, what if your iterable was infinite? As a result, any attempt to sort an iterable that is not already a friendly type like a list or tuple will result in an error. We avoid this by coercing to list, fully realising the iterable and putting it into a form that python will let us sort. It also avoids the nasty side-effect of mutating the list that gets passed into us, which the caller may not expect. Consider this example: ``` >>> my_hours = ["one", "two", "three"] >>> print my_hours ["one", "two", "three"] >>> backfill_node(base_dir, node, stream, variants, hours=my_hours, order='forward') >>> print my_hours ["one", "three", "two"] ``` Also, one of the linter errors was non-trivial to fix - we were trying to get a list of hours (which is an api call for a particular variant), but at a time when we weren't dealing with a single variant. My solution was to get a list of hours for ALL variants, and take the union. 6 years ago			`import random`
common: Create general function for timing things, and use it to time get_best_segments The function is quite customizable and therefore quite complex, but it allows us to easily annotate a function to be timed with labels based on input and output, as well as normalize results based on amount of work done to get a better picture of the actual amount of time taken per unit of work. This will help us monitor for performance issues. 6 years ago
Port existing cutting code from restreamer into common Note this moves over the 'experimental' cutter and deletes the original cutter that concatenates entire videos before cutting. We may eventually want to revive that method if the experimental cutter turns out to introduce too many issues. We move most of the code over verbatim, but adjust it such that it acts as a generic iterator that can be used in a variety of contexts. Some other changes made during the move include telling ffmpeg to be quieter (don't output version info and junk, only log if something goes wrong), and avoiding errors during cleanup. 6 years ago			`from .segments import get_best_segments, cut_segments, parse_segment_path, SegmentInfo`
moved flask monitoring to its own module 5 years ago			`from .stats import timed, PromLogCountsHandler, install_stacksampler`
common: Basic config and bustime code 6 years ago

Remove central config file as it's more trouble than it's worth Simpler and easier for testing to stick to configuration via CLI args. We'll worry about deployment later. 6 years ago			`def dt_to_bustime(start, dt):`
			`"""Convert a datetime to bus time. Bus time is seconds since the given start point."""`
			`return (dt - start).total_seconds()`
common: Basic config and bustime code 6 years ago

Remove central config file as it's more trouble than it's worth Simpler and easier for testing to stick to configuration via CLI args. We'll worry about deployment later. 6 years ago			`def bustime_to_dt(start, bustime):`
common: Basic config and bustime code 6 years ago			`"""Convert from bus time to a datetime"""`
Remove central config file as it's more trouble than it's worth Simpler and easier for testing to stick to configuration via CLI args. We'll worry about deployment later. 6 years ago			`return start + datetime.timedelta(seconds=bustime)`
common: Basic config and bustime code 6 years ago

Implement main part of sheet sync 6 years ago			`def parse_bustime(bustime):`
			`"""Convert from bus time human-readable string [-]HH:MM[:SS[.fff]]`
			`to float seconds since bustime 00:00. Inverse of format_bustime(),`
			`see it for detail."""`
			`if bustime.startswith('-'):`
			`# parse without the -, then negate it`
			`return -parse_bustime(bustime[:1])`

			`parts = bustime.strip().split(':')`
			`if len(parts) == 2:`
			`hours, mins = parts`
			`secs = 0`
			`elif len(parts) == 3:`
			`hours, mins, secs = parts`
			`else:`
			`raise ValueError("Invalid bustime: must be HH:MM[:SS]")`
			`hours = int(hours)`
			`mins = int(mins)`
			`secs = float(secs)`
			`return 3600 * hours + 60 * mins + secs`


common: Basic config and bustime code 6 years ago			`def format_bustime(bustime, round="millisecond"):`
common: Fix bugs and issues with bustime utils 6 years ago			`"""Convert bustime to a human-readable string (-)HH:MM:SS.fff, with the`
common: Basic config and bustime code 6 years ago			`ending cut off depending on the value of round:`
			`"millisecond": (default) Round to the nearest millisecond.`
			`"second": Round down to the current second.`
			`"minute": Round down to the current minute.`
			`Examples:`
common: Fix bugs and issues with bustime utils 6 years ago			`00:00:00.000`
			`01:23:00`
common: Basic config and bustime code 6 years ago			`110:50`
			`159:59:59.999`
			`-10:30:01.100`
Fixed format_bustime docsting 6 years ago			`Negative times are formatted as time-until-start, preceeded by a minus`
			`sign.`
			`eg. "-1:20:00" indicates the run begins in 80 minutes.`
common: Basic config and bustime code 6 years ago			`"""`
Fixed negative times in format_bustime 6 years ago			`sign = ''`
			`if bustime < 0:`
			`sign = '-'`
			`bustime = -bustime`
common: Fix bugs and issues with bustime utils 6 years ago			`total_mins, secs = divmod(bustime, 60)`
common: Basic config and bustime code 6 years ago			`hours, mins = divmod(total_mins, 60)`
common: Fix bugs and issues with bustime utils 6 years ago			`parts = [`
			`"{:02d}".format(int(hours)),`
			`"{:02d}".format(int(mins)),`
			`]`
			`if round == "minute":`
common: Basic config and bustime code 6 years ago			`pass`
			`elif round == "second":`
common: Fix bugs and issues with bustime utils 6 years ago			`parts.append("{:02d}".format(int(secs)))`
			`elif round == "millisecond":`
			`parts.append("{:06.3f}".format(secs))`
common: Basic config and bustime code 6 years ago			`else:`
			`raise ValueError("Bad rounding value: {!r}".format(round))`
Fixed negative times in format_bustime 6 years ago			`return sign + ":".join(parts)`
common: Implement code for parsing paths and picking the best sequence of segments This is needed by both the restreamer and the cutter, hence its inclusion in common. The algorithm is pretty simple - it takes the 'best' segment per start time by full first, then length of partial. All the other complexity is mainly just around detecting and reporting holes, and being inclusive of start/end points. 6 years ago

moved rename, ensure_directory and jitter to common Move a few useful functions in downloader used in the backfiller to common 6 years ago			`def rename(old, new):`
			`"""Atomic rename that succeeds if the target already exists, since we're naming everything`
			`by hash anyway, so if the filepath already exists the file itself is already there.`
			`In this case, we delete the source file.`
			`"""`
			`try:`
			`os.rename(old, new)`
			`except OSError as e:`
			`if e.errno != errno.EEXIST:`
			`raise`
			`os.remove(old)`

fixed white space and the like 6 years ago
moved rename, ensure_directory and jitter to common Move a few useful functions in downloader used in the backfiller to common 6 years ago			`def ensure_directory(path):`
			`"""Create directory that contains path, as well as any parent directories,`
			`if they don't already exist."""`
			`dir_path = os.path.dirname(path)`
			`if os.path.exists(dir_path):`
			`return`
			`ensure_directory(dir_path)`
			`try:`
			`os.mkdir(dir_path)`
			`except OSError as e:`
			`# Ignore if EEXISTS. This is needed to avoid a race if two getters run at once.`
			`if e.errno != errno.EEXIST:`
			`raise`

fixed white space and the like 6 years ago
moved rename, ensure_directory and jitter to common Move a few useful functions in downloader used in the backfiller to common 6 years ago			`def jitter(interval):`
			`"""Apply some 'jitter' to an interval. This is a random +/- 10% change in order to`
			`smooth out patterns and prevent everything from retrying at the same time.`
			`"""`
			`return interval * (0.9 + 0.2 * random.random())`
Moved encode_strings to common 6 years ago

			`def encode_strings(o):`
			`"""Recurvisely handles unicode in json output."""`
			`if isinstance(o, list):`
			`return [encode_strings(x) for x in o]`
			`if isinstance(o, dict):`
			`return {k.encode('utf-8'): encode_strings(v) for k, v in o.items()}`
			`if isinstance(o, unicode):`
			`return o.encode('utf-8')`
			`return o`