strptime is much faster but can't handle as varied formats.
But in this case we fully control the format, so there's no reason not to use it.
Profiling suggests we spend about 80% of our time in get_best_segments just parsing dates,
so this is a signifigant performance gain.
The prometheus client uses a threading.Lock() to prevent shared access to
certain metric state. This lock is taken as part of doing collection, as well
as during metric.labels().
We hit a deadlock where our stack sampler signal arrived during a collection,
when the lock was held. This meant that flamegraph.labels() blocked forever,
and the lock was never released, hanging all metrics collection.
Our solution is a hack, which is to reach into the internals of our metric object
and replace its lock with a dummy one. This is reasonably safe, but only as long as
the prometheus_client internal structure doesn't change signfigiantly.
The function is quite customizable and therefore quite complex, but it allows us to
easily annotate a function to be timed with labels based on input and output,
as well as normalize results based on amount of work done to get a better
picture of the actual amount of time taken per unit of work.
This will help us monitor for performance issues.